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This paper attempts to put program evaluation into a broad but realistic perspective for managers 
and program practitioners in public Agency human services programs at the State and local levels 
The bulk of the current literature on evaluation is occupied with the presentatiraand discussion of 
technical methods which might be employed inprogram evaluation. By contrast, tnfe^aaeeris neither 
a manual nor a how-to-do-ii guide. Many of these already exist. Instead, ft is an attempfte^ctiarac- 
terize the meaning and intent of program evaluation; to compare it withsimilar or allied approaches to 
improving decision making and managenrllht through the use of formal tools; to present some of the 
evidence on what difference evaluation seems to make in practice; and to drscuss some of the basic 
issues raised by attempts to apply formal methods of evaluation. It concludes with suggestions to 
help the public agency practitioner decide whether or not to conduct formal program evaluation ip a 
specific instance and what issues- appear useful to consider. * 

The paRer rests on-three interrelated premises, first, public programs are "evaluated" from a 
number of different vantage points and through several different mechanisms all of the time, even 
thbugh the volume of "formal technical evaluation" may be small or absent Second, the general 
evaluation logic of assessing the worth and value of programs is appealing in principle, but the 
conditions under which formal evaluation will pay off in this way are more limited than is generally 
assumed. Third, formal program evaluation costs time, energy and money and should be treated like 
any other valuable commodity— with care and prudence. Like any other use of public resources, this 
one ought to meet a' basic test of reasonable payoff: In any specific application, will program 
evaluation be "worth it?'* • * 

Thebulk of the early literature on formal program evaluation (in the latasixties and early seventies) 
was optimistic about its usefulness in most circumstances. The evidence examined here albng i with 
testimony and reports from the field in recent years suggests; however, that the record is spotty] that 
the results of only a small proportion of actual formal evaluations appear to have impacfand that 
reappraisals and more modest expectations about program evaluation are now appropriate The 
paper identifies some of the proposals to refocm evaluation practice and closes with some sugges- 
tions which might help a public agency official think about evaluation in a realistic way. 

I have drgwn selectively on the extensive evaluation literature and on discussions with knowledge- 
able individuals. I have also drawn generously on my own experience over the past 20 years as a 
public agency employee (DHEW), a consultant to public agencies and a field evaluator of several 
public agency programs. In particular, I have drawn on ideas and material from work on output 
measurement in elementary and secondary education done for 4h$ National Center for Education 
Statistics, U.S. Office of Education, at Georgetown University in th6 early seventies; writing on needs 
assessment methods for the Office of Program Systems, OASPE, DHEW in the late seventies; a 
review of research on evaluation and pther management methods for the National Science Founda- 
tion carried out at the Urban Institute in the mid-seventies; and the irjsights and lessons shared with 
me by public and private agency officials at Federal, State and local levels I have had the good fortune 
to encc>unter in my work as a consultant. The background study for this paper was completed early 
spring '1980. -- 

Part I traces briefly the early history of formal program evaluation as I understand it and identifies 
them^ny ambitious claims which were (and still are) m ade for its valu$ andpayoff to a public agency. 

Part II develops context for understanding formal evaluation. It recalls the mdfiy mechanisms 
which society and public agencies have,for evaluative judgments about programs; accents the 



Human Services Monograph Series • No* 18, April 1981 -vii 



ER?C ; 



dominant features of program evaluation by cdmparing them wittfthose of program planning, policy 
analysis and needs assessment; and lists some of the alternatjve methods, mechanisms and 
processes available for program evaluation. * { 

Part III discusses the following basic issues which are raised when formal program evaluation is 
attempted in practice: the role of our expectations and values, the mixing of technical and value 
considerations in the selection of evaluation critefta and indicators or measures, a few of the realities 
which inhibit the aspiration to establish the "causes" of program effects, the tendency of some major 
traditional evaluation methods to miss or mask the inevitable and essential adaptations of general 
program designs to highly variable local circumstances; the impact on evaluation of the limits on 
public agency control; and whether program evaluation is "science." Since the discussion to this 
point covers a lot of terrain, a brief midpoint summary of major conclusions is provided in the 
concluding section, % . ' f 

Part IV summarizes some of the sketchy evidence contained in ten individual sources on the 
impact which formal program evaluation appears to make in practice. 

* * • 

Part V states briefly a few of the many recent propo'sed reforms of traditional evaluation theory and 
practice 

Part VI provides advice to the State or local agency official or program practitioner. who may be 
considering carrying out some formal evaluation activities. /• 

Some readers may wortder why experience^ the Federal Government are referenced Soften 
The reason is because lessons from available Federal experience (a dozen years) may contribute to 
a useful intergovernmental learning experience Some of these lessons are recited at the end of Dart 
IV 



The casual, busy or knowledgeable reader may not want to read every word. The major parts may 
be read independently The "bare bones" are contained in part I (Introduction), part III (Summary and 
Conclusions), part IV (Intergovernmental Lessons) and the summary advice given in part VI. 
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I. Introduction 



Why Put Program Evaluation in Perspective? 

Program evaluation carried ouf by or on behalf of governments in this country is now "big 
business."' The nation spends well over a quarter billion dollars a year funding "evaluation.' 
Thousands of field evaluatipn studies have been conducted over the- past 15 years. Hundreds of 
books, papers and articles have been written and many published to create an evaluation literature 
which is now vast. It contains scores of how-to-do-it manuals, several evaluation research hand- 
books, dozens bt case studies, proposed methods, growing reviews, critiques and criticisms, 
numerous exhortations to do evaluation, occasionat horror stories from those who have done it, 
advice on how to avoid common pitfalls arid a growing number of proposed reforms. 

Substantial promises have beep made in the past to agency executives, legislators and the public 
about the value of systematic, formalized evaluation of public programs Evaluation will, it has been 
claimed, generate "objective" information about the operations andjmpacts of programs, tell us 
where they are sTrong and weak, indicate what is working and what is not and thereby save us 
heartache,. frustration and millions, if not billions, by improving the design, performance, manage- 
' ment, efficiency, effectiveness and impact'of public programs. 

4 

> Since the mid-sixties public outlays for evaluation have grown steadily "Earmarks" and # "set- 
asides^* for evaluation have'been written into scores of Federal program authorizations. Numerous 
and sometimes large staff offices focusing on evaluation have been established at the Federal level in 
'particular and at the State and local levels as well. A large contract business has developed. An 
evaluation profession is said to be in the making. New professional evaluation societies and 
specialized journals and reviews appear at the rate of about one a year. Colleges and universities 
regularly offer courses and sometimes degrees in program, and policy evaluation and research A 
growth industry has been built around Federal mandates for evaluatipn. It represents a growing 
politicar constituency. * \ * ^ 

Yet despite this growth and prosperity (or partly beoause\)f it); major problems have emerged 
aboat the relevance, conduct and value of evaluation activities. Arguments are common concerning 
what evaluation really is or should be. Some Claim we can all do it, others that only social scientists 
can do it and still others that we have not done it right }et. Some suggest thaf evaluation is common 
sense and others that it is science. Critic? charge that the early promises about the payoff of 
evaluation have been broken, while proponents claim that expectations have been too high and that ■ 
more timejs needed to perfect evaluation tools. Some claim that evaluation has failed, that Success is 
small and waste is frequent. Others claim that "knowledge is power" and that evaluation "research is 
the only route to an "objective" basis for action. Others say that science has been perverted by 
politics and corrupted by contract economics. And recently, some- leaders from the social sciences 
have taken critical looks at" the excessive claims for and over-extended condition of social research 
applied to action programs. 

In.the meantime, many Federal and some State programs mandate, finance and encourage 
program evaluation as a part of program implementation and as a precondition for continued 
government support. The responsibility for a significant share of federally mandated, and State- 
legislated evaluation falls to State and local government agencies: In the face of, the ambiguity which 
surrounds the promise and payoff of formal evaluation, it is worthwhile to examine briefly the origins of 
trie emphasis on formal evaluation what evaluation seems to be and some of the major issues which 
it has raised. 
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A Brief Histor^of Formal Program Evaluation v 

Once formal program evaluation had been mandated and practiced widely, scholars and corpfneli- 
tators began to search for its roots* When did formal program evaluation begin? Lmdblom and Cotien 
(1 9.79) note the rise of what they call 'professional social inquiry" (which includes evaluation) around 
the'beginning of the-century Suchman (1967) reports that'a concern with evaluation dates from the 
early beginnings ofvarious fields of public service He cites, forexample, "a penod of mounting health 
surveys and program evaluations" from 1 907 to 1 927, and the development of comparative commu- 
nityrajing sheets by Chapin in 1914. He claims that it was not until after World War I that a 'real 
v dej|fand for critical self-appraisal set in," but this jjemand appeared to result in "rather arbitrary, 
evaluation guides" at]6 standards, (pp. 13-15) Freeman (1977) sees a sign for the current empha- 
sis on for/naf program evaluation during the depression. "In 1 935, arfobscure sociologist, teaching at 
a then-small state university in thasouthern United States [Arkansas) published a paper pleading for 
the experimental evaluation of Franklin D. Roosevelt's New Deal Social Programs." (p. 18) These 
early ^antecedents, hfowever, seem groping, sporadic and localized. Both Such man and Freeman 
point most directly to the 1950js for the stirrings of a more general concern with and advocacy for 
systematic program evaluafcrfv Though numerous, these efforts were still circumscribed, episodic 
and conducted in the mood of research and demonstration. It was not until the mid-sixties that the 
current wave of sustained modem program evaluation began. The federal Government secured 
specific congressional authorization for program evaluation and began to invest sizable funds in it. 

McLaughlin (1974), for example, reports that a requirement for the first significant evaluation of a 
major Federal program came in the provisions of the Elementary and Secondary Education Act 
(ESEA) of 1965 According to McLaughlin, Senator Robert Kennedy made his support for the new 
ESE\ contingent on stronger reporting and executive oversight. In a meeting of principal drafters of 
the Act. _ .... 

Kennedy argued for an account of program activities as well as a strong USOE oversight role unless 
there is a meaningful program developed at the local level, which is really tested and checked by you 
| USOE j, I don't tjiink that this program is going to be^Hective ' 

From this meeting there emerged the notions of a reporting and dfesemination scheme that was 
subsequentjfincluded in the ESEA legislationiand of the evaluation provision that requires ESEA Title 
I projects \£ be regularly assessed for their effectiveness in meeting the special educational needs of 
disadvantaged children " (p 3) 
< J 

It was also during thissame period, in August 1965, that President Johnson ordered all the major 
Federal departments and agencies to install the so-called Planning-Programming-Budgeting Sys- 
tem (PPBS) formerly used in the Department of Defense by Secret3ry|Robert McNamara That 
system was intended to improve the efficiency and effectiveness of resource allocation through 
systematic^qjulti-year program planning supported by systems, cost-benefit, cost-effectiveness and 
related analyses. - * \ , . u * 

As analysts turfied to the actual conduct of studies and analyses, however, they soon discovered to 
their surprise and disappointment that data on "output," "benefits" and Effectiveness" hardly 
existed They alsa noticed that while manyjiepaftm^nts supported numerous and a wide array of 
program projects classified as "research and demonstrations," few of these yielded output, outcome, 
cost, benefit or effectiveness data which woufd support economic and "systems" modes J of analysis. 
In DHEW, these discoveries precipitated a department-wide inventory by analysts in the newly 
created Office of Program Coordination (later Ranging and Evaluation). Economists at the Bureau of 
the Budget {now OMB) and DHEW had noted that industry often spent about.4 to 6 percent on 
" research and development" activities. Why should a large department like DHEW not spend at least 

1 to 2 percent on evaluation? 

Themventory indicated that the fundingof studies (which could be classified a£ studies of outcome 
or output) was uneven and low— below the level judged adequate. Based on the inventory, a desire 
to increase the volume of studies to support qjfialyds, and with an arbitrary 1 percent figure in mind, 

> \ - 
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the "Secretary issued a memorandum calling for a larger departmental investment in program 
analysis and evaluation. (It is noteworthy that thedecision to initiate financial support for evaluation 
activities was not based on "verified" evidence-tfiat formal evaluation had an impact in practice It is 
now clear that- little or, no formal evidence existed.) To generate funds for such analytical and 
evaluative work, the idea emerged to include m some new legislation an earmarked authonzation for 
the Secretary to spend "directly or indirectly" (either througtvdirect staff effort, grants or contracts) ' up 
to 1 %" of the annual appropriation for a program on "program evaluation^" . . 

The first authorization, of which I am aware, for the so-called "1 %setaside" (which now supports, 
the bulk of the DHHS evaluation effort) was included in the Department's proposed Partnership for 
Health Amendments of 1967. (Note. The Congressional Committee Report on the amendments 
contamed-several paragraphs drafted by the author, then a staff member of OASPE, DHEW, to justify 
the new authority. ) Once the pattern was set, similar earmarked evaluation authorities were proposed 
for an ever-increasing number of social programs. In 1969, for example, a new administration 
decided to blanket in 1 all the major departmental authonties for scores of human services programs 
under this evaluation authority. 

In time, provisions in legislative statutes were written that not only authonzed Federal-level 
evaluation but also authorized and frequently mandated program evaluation for recipients of Federal 
funds at the State and local levels as well. Thus began the Federal support for prdgram evaluation 
which by 1976 had reached an estimated national level of over a quarter of a billion dollars It is to the 
general problemsfend issues of formal program evaluation that this paper is devoted We begin with a 
- basic question. ' 

♦ • . 

What Afe the Claims for Formal Program 
Evaluation? 

According to many proponents, program evaluation. . — * c 

1 Consists of the study of public Rffcgrams and their impacts through the use of systematic 
(sometimes characterized as "scientific"-) methods of investigation and "research; 

2 Will generate a body of valid, reliable and presumably verified propositions and conclu- 
sions about the impacts and'operations of programs; 

3. Will thereby indicate in an impartial way what effects acjually^fesult from a given program 
and which features of a program account forjhem (their causes)', add 

4 Will make thisnew and' objective': information available to decision makers responsible for* 
the program as a basis (partial but crucial) for deciding whether and to what extent a 
"program is working, why if is working the way it is and, presumably, what can be dpfle to 
improve it. ^ 

The application of evaluation methods arid the provision of new unbiased information to decision 
makers will it is argued, lead to an improved understanding of the program This will lead, in tur c n, to 
• an improvement in the overall "rationality" of decision making. The result of these improvements will 
be increased efficiency and effectiveness. Programs will work better. Resources will be saved Public 
agencies will be held "accountable." The general welfare will be better served^ 

With these hopeful prospects in mind, early proponents of evaluation heralded two additional 
broad-based claims and assumptions: , ^ , 

1 Programs could be evaluated ip'their totality (later called "summative evaluation") to yield 
* * comprehensiveMnformation, conclusions angl judgments on their overall worth and worka- 
bility; and 

2 A Program evaluation could (artd shpuld) be applied to every significant, program — the 
principle of Evaluation Universality.' - > 

t 

✓ Human Services ^lonogQaph Series • No. 18, April t981 3 

\ 11 ' 



me early claims for formalized evaluation were often made with vigor, assertiveness and an 
occasional touch of arrogance Some viewed publip agencies as largely entrenched, self-serving and 
lethargic. Decision makers and program managers were sometimes depicted as narrow-minded, 
myopia bureaucrats who did not appreciate the power of the tools of formal evaluation. They 
appeared to spend most of their time protecting thejr turf and covering their mistakes, while-the "big 
- issues" of refined program objectives, effectiveness measures and "causation" wentby the wayside. 
Although decision makers presumably had power, they seemed to spend it in fights over office spade 
_and carpeting and not over dubious program premises and poor program designs, 6 . 

By contest, program evaluation was seen as a swift and sure-footed route to clearer objectives, 
reliable and impartial data, scientific "facts" and verified conclusions which could be used to root out 
ignorance, motivate bureaucrats, "depoliticize ,,N decision making and^s WHdavsky (1979) has put it, 
sp^ak truth to Dower. If a revolution weFe not in the'offiAg, we seemSoat least on the verge of a new 
era of rationality and reform, ^valuators aided by social science methods would se"t us free frt>m 
self-iQterested politics. Or so it seemed. 

these are, obviously, ambitious claims for formal program evaluation. To what extent have they 
been fulfilled?^ growing body of evidence, critical review, reports from the field and self-jcriticism 
suggest that in terms of nearly all the early claims, formal program evaluation has fallen very short. 
Here i§ recent testimony from experienced evaluators. 

• #* 

• After *5 years experience at the Urban Institute with the theory and the application of formal 
program evaluation to a range of Federal programs, Schmidt, Scar/ion and Bell (1979) opened 
their gjpposed reform of evaluation ("evaluability assessment") with this judgment: 

ypongress and the executive branch have increasingly invested in program evaluation over the past 
detade StartmgJrom nearly nothing; in the early sixties, iVwfcstment in evaluation grew Jo around a 
'quarter of a billion dollars by 1976. ifhfortunatefy, however, the'investment has not yet paid off 
Program evaluation has not led to successful policies or programs Instead, it has been planned and 
implemented in isolation/rom Federal decisionmaking, and has produced little information of interest 
4 and utility to policymakers and managers (p 1) 

• Rossi and his colleagues screened ^several hundred" Federal evaluation RFP's (requests for 
proposa)) "searching for examples we could use for didactic exercises in the Summer Institute. We 
were dbletotfnd less than a dozen that we could use.. . "A further search of "more than a hundred" 
completed evaluation research reports using "minimal" standards yielded noTftiore thaD a half 
dozen of "high quality." Rossi concluded: y 

4 * 

Jhe fact of the matter is that most evaluations are still not worth much more than no evaluation at all 
(Datta and Perloff, 1979, pp 20-21) 

• In their study o( education evaluation, Alkin, Daillak and White (1979) "discern the very few 
discordant cries" that evaluation works and report; 

, In factTfne hte/ature is replete with gloomy statements about the mfpotence or futility of evaluation. 
There, seems to be a consensus i n the literature that there has been little impact of evalu ative research 
on program decisionmaking (p. 14) 
« 

• Evaluates at the Rand Corporation recently reflected <5n dominant evaluation practices in educa- 
tion Ip a set of engage papers, they comment qn the modest contribution of formal evaluation 
-and propose several reforms to current practice. Editor John Pincus notes that: 

•Mo^t policymakers want their programs to succeed, but rrtest "scientific" evaluations address effects 
and indicate, that studenj outcomes, as measured by test scores, drop-out rates, and other such 

• m measures aJ>pear.)o be little affected by new government education programs. Such reports of "no 

j signHcant effect" are generally unaccompanied By useful recommendations for program imprpvement 
or policy change. Meahwhile, policymakers seek to know not only about effects, but also about what I s 
<joing onjn the program. ... In effect, what can result is a "dialog of the deaf," in which neither party 
understands the othe/s premises Is it possible to reduce these tensions and improve the utility of 
evaluation to public policy? (Pmcift, 1 980,- pp. .1-2) 
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» The four Rand essays "find fault with current evaluation methods, each from a different perspec- 
tive, and call for improvements" and "for a retreat from the somewhat over-a^bitiouspi*etensions of 
socjal science in the earlier years oKevaluation studies. . . ." (p. 5) * " 1 

• in her reflections o\l' Evaluation and Alchemy," McLaughlin (1980)' of Rand recoup _ 
central Office of Education budget for evaluation "mushroomed from about $1.2 million in 1 968 to 
about $21 million in 1977: ' . 

But despite the enargytfnd resources devoted <o the task, many researchers and practitioners believe 
these evaluation efforts are largely a waste of time and money, (p. 41) _ 

She continues: . * 

Our research supports the charge that much of the present evaluatron is irrelevant and mappropriate- 
that»most evaluations ask the wrong questions and use the wrong measures (p 42) 

Her proposed reforms run deep: « 

What we know about the process of ^Mmge implies that evalu ation models derived from other realities 
• -m cToSS £ and sWpsychology- s.mp.y do no. ... the reality of a pubhc ^soad 
service system education ,n part.cufar^elog.c of inquiry is wrong And preoccupation with scentism 
and with fixing our traditional evaluation paradigms scants what we do know One major challenge 
lor evaluators. then, is epistemological to develop nfew and valid ways of knowing (p. 46) 

' [Note Epistemology refers.tp a branch of philosophy that investigates the nature and origin of 
knowledge How do we get^t and what does it rest on?) 

• In a broad and self-conscious apppisal of "evaluation research," Campbell" (1979), a widely 
quoted applied social scientist and methodologist, reports: 

. We cannot yet promise a set of professional skills guaranteed to make an important <Merence In the • 
few success stones-of benef.cal programs unequivocally evaluated, soc.ety has gotten by or oould 
• have gotten by. without our help We s.HI lack instances of important contnbuhons ; to societal innova- 
tion wh.ch were abetted by our methodological skills The need for our spec.alty and the spec, .c 
recommendations we make, must still be justified by promise rather than by past performance, (p 68) 

The testimony of these evaluators is sobering if not disturbing. Why has ;P ra 9 ra ^ eva ' u |£ n m ^ 
such a seemingly poor showing? We attempt here a preliminary understanding by continuing with 
another basic question. » 
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II. What Is Program Evaluation? 

Though this is a simple andstraighfforWard questiondtappears to have no readfanswer. In one of 
the early attempts todiscuss evaluation", for example, Suchman (1 967) called attention adozen years 
ago to the fact that "evaluation despite its widespread popularity is poorly defined ai^nmproperly 
used" (p 27) Block andlfiighardson recently remarked that "Np concept is so misused in social • 
science as evaluation,? . ." (1979, p. 9) And the claim has been made occasipnally-fhat there are as 
-many definitions of evaluation as there are -writers on the subject. % 

As often the case, the dictionary .offers a good first approximation to the meaning of evaluation. 
According to the New College Edition of the American Heritage Dictionary of the English Language 
(Morris, ed., 1976), "evaluate" means ' ' 

*t 1 To ascertain or fix the value or worth of. 2. To examine and judge; appraise, estimate: -Plato has 
been evaluated as having one of the finest minds the vtflrtd has produced (S E. FrosJ, 
Jr ). 3. Mathematics To calculate or set down the numeral value ofsexpress numerically 
■ — See synonyms at estimate .. , 
At bottom, most writers on evaluation endorse the general dictionary notion: to ascertain or fix the 
value or worth of something. ' i 

Weiss (1972) captures the same idea: „ 

Evaluation is an elastic word that stretches totfever Judgments of many kinds .What alt the uses of 
the word have in common is the notion ofjudging merit. Someone is examining and weighing a 
phenomenon against some explicit or implicit yardstick (p 1) 

The crux of the evaluation problem, thet^ppears tq be establishing the worth or value or merit of 
something. All evaluations must start With this concern and'inevitably return- to it. 

Although the'word "program" is widely and comrn^ used in public agencies, it, too, has a 
variable meaning. Nearly any activity may be caHed a program:^ research program, a regulatory 
program a technical assistance program, a md&itoring program, an audit program, a grant program 
and so on. Irian evaluation context, it appears useful to use the term "program" in a general way to 
refer to the organized use.of public resources directed toward the accomplishment or achieve- 
ment otone or more' purposes andlw objectives: The program evaluation of concern here is 
directed primarily to human services programs: child day care services, community mental health 
services education services for the disadvantaged, manpower training and placement services 
vocational rehabilitation .and a wide variety of other health, welfare, education, housing and social 



services 



If we combine the basic meanings of the two words, "program evaluation" can be -roughly 
characterized as: " , 

attempts tojascffrtain or fix the value or worth of the use of organized public resources directed toward ^ ( 
one orjiore purposes and/pr objectives. 

. Before we examine program evaluation further, a brief digression on the origins of evaluation will be 
useful. 

.16 Evaluation Something New?* , 

In terms of its basic meaning Of judging worth and value, human beings have probably always been . 
Engaged in evaluation.- From time-to time and sometimes continubusly we evaluate, wittingly Dr not, 
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most aspects of our lives: our jobs, living- circumstances, the behavior of our children, the things we 
; buy and use, the weather, the news, art, political events, our health, taxes and the like. TheVe runs 
through all we think, do and say a,deep-seated, irresistible, and wholly human element of^estimating 
.and judging what has been and i§ happening, whether or not we like it and- how much or how little 
Both our everyday and professional languages are laden with evaluative and judgmental words 
phrases, content and overtone. It is only a slight exaggeration tp say that ip our thoughts, choices and 
other behavior we are an "evaluating .species" — well practiced at judging worth and value.' » 

As individuals, however, we are, in principle, at liberty to assess the worth or value of something m 
light of our own pnvate values and preferences. We do not usually have to take into accounf the 
values and. preferences of others. As we move from strictly individual; private evaluation to making 
judgments on behalf of a family, a group, an agency, a community or a State, ihe scope of relevant 
■ values expands In the arena of public policy makingjn ademocratiopolitical system, for example, we 
expect decision makers to take into account and reconcile the multiple and diverse values of those' 
who have a stake in a policy or program, whether they are existing or potential clients, -service 
providers other r agency.officials, the public (defined in sbme way as taxpayers, beneficiaries.'etc.) or 
other stakeholders" a/id "interested parties," . . - , 

The problem of assessing worth oynenf takes on a complex character as we move from individual 
to collective or social values as the basis for judgment. As we note later, some of the dilemma and 
difficulties of program evaluation undertaken on behalf of public agencies arise from the attempt to 
move from evaluation at the individual level to evaluation at a collective level. 

, r « 

Bfefore identifying the major methods by which formal social and program evaluations to occur V 
will be worthwhile to ask another basic question, '* 1 * . ' • 

• c ♦ 

1 ' If / • ■ » 

What Did Public Agencies Do Before Formal ' 
Program Evaluation! 

If some of the criticisms and claims" of early advocates of formal evaluation were taken at face 
value, one might conclude tftatbefore.the advent of'formal program evaluation public agencies had 
no way of 'knowing how well^xisting programs were faring, that agency officials and program 
managers opened in a vacuum of information and knowledge ^bout program workability and 
impact, that feedback did not exist or was fatally flawed and that decision makers merely flew by 
the seat of their pants" or trusted- only to their "gut reactions." 

There are kernels of tr,uth fo some of these criticisms but they under estimate the wide variety of 
governmental, social, economic and political information-generating, feedback and program-testing 
mechanisms which do exist Here is a list of some of the familiar mechanisms, linked in practice by 
complex social, political and bureaucratic processes through which judgments about the value and 
worth of public agency programs»and services are regularly rendered. ' 

Official Public Mechanisms 

Legislative/Council Review - . . * . , 

Program authorization hearings aftd debates; ' 
Budget review, hearings and debates; - 
PubNc hearings; and , * 

Oversight* hearings (or studies). * 

Executive Review 

Program and budget^eview; - 

•Special studies (e.g., study group, task force, commission, etc.)? 
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Site visits and reports; * - 4 
* Audits (financial and sometimes program); and * 
prganizational and management analyses. m 

Judicial Review 

Court rulings on issues of.eligibility, rights, due process, etc. ^ % 

' General Public Review 

Candidate selection through primaries and elections; 
Communication between elected officials and constituertts; 
Consumer and client complaints and grievances; and 
Media reporting (general and investigative).- 

Professional Review 

Professional contribution to program design and implementation; 
Testimony and reports by program and problem experts; 
Criticism and commentary by social* scientists; and 
Policy statements aod commentary by professional associations. 

Specie! Interest Review 
Case-stating, criticism, commentary, evidence-reporting and "pressure" by or- 
ganized interest groups. 

Review Through Market-Like Mechanisms 

Competition among claimants for social resources. 

Economic Market Mechanisms 
Valuation through exchange, competition and pricing. 

The degree of visibility, institutionalization and effectiveness of the functions performed by these, 
mechanisms db, of course, Vary. They may rangefrom highly articulated and distinctive mechanisms 
to those which are rudimentary, informal and episodic. In some instances, no identifiable mechanism 
may exist at all. f 

Some- proponents of formal program evaluation find the assorted legislative, administrative, 
judicial, social, economic and political mechanisms (and related processes) for evaluation in- 
adequate: vested with special interests, preoccupied with political considerations and bereft of 
adequate formal" evidence pn.the basis of which informed- ("rational ) judgments could be made 
about proqram design, funding and redirection. These mechanisms remain, however, among the 
most dominant and widely used vehicles by which collective social judgments are expressed about 
the use of resources in public programs. And they are the primary mechanisms through which the 
results of formalized program evaluation must be used, if they are to be used at all. 

Some proponents of formal evaluation recognize the constraints that existing political processes 
and mechanisms place on the utilization of formal evaluation results. They sometimes suggest 
reform of the institutions and processes of politics as a way to increase the use of evaluation^ There is. 
-noquestion that political and administrative institutions and processescan bereformed in attempts to 
increase their accountability. In general, however, it is a common error to miss or dismiss the value ot 
existing social processes of interaction as available mechanisms for collective program evaluation, 
(gndbloro, 1965; Lindblom and Cohen, 1979) 
• < * 
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"V. 

The Emphasis of Program Evaluation 

A * 4 

, Every major method and approach to "improving" the performance of an organization focuses on, 
accents or emphasizes" some aspects of an organization behavior, structure, functions or proc- 
esses (e.g., Kimmel, Dougan and Hall, 1974). In most cases, a "new" approach tends to make more 
explicit, more formalized and more central a set of functions which are already being performed 
though pejhaps in a more implicit, informal- and Jess sustained way. Management by OJjjectives 
(MBO), for example, focuses on internal short-range management goal and objective setting. It is 
intended to induce joint objective setting between superiors dhd subordinates and thereby increase 
communication between them. Program Evaluation Review Technique (PERT) is intended to im- 
prove an organization's capacity for defining and relating tasks, for work scheduling and for determin- 
ing optimal or critical paths through complex interrelated activities by estimating and comparing their 
tittle and/or cost requirements. Organizational Development (OD) is intended to improve organi- 
zational performance by improving employee self-consciousness and interpersonal relations. 

The literature both states and implies that program evaluation is intended to "rationalize decision 
< making," to provide valid informatiiph on the performance of a program, and thereby to improve 
decisions about the design, level of funding, operation and management of a program. Similar claims 
are made for several other methods and approaches urged for use in public agencies, for example, 
program planning, policy analysis and need? assessment. A brief comparison of program evaluation 
t with these three approaches will help highlight distinguishing features of program evaluation. Table 1 
displays selected major feafures of these approaches which are reflected in the philosophy and 
general logic set out in the literature. These are, obviously, "average" representations. In all cases 
there are large and small variations from writer to writer and from application to application. The 
comparative table suggests these highlights: 

General Contrasts 

In a very summary way the four approWies have these general orientations* 

Program planning focuses on the use of future resources to achieve a set of tentatively estab- 
lished goals and objectives over a multi-year period. This appra^ch typically rests on a comprehen- 
sive view of an agency's programs and on estimates of future ctaditlons, costs'and expected results 
of programs, some of which are yet to be'formUlated. \ I 

Program evaluation, by contrast, focuses on a program already formulated and operating. The\ 
attempt is not tof orecast orpredict the future but to retrodict the past— to identify, gauge, and judge * 
the value of the results which the program has already generated. It addresses the basic questions. 
What difference has the program already made? Is the program worthwhile? Does it work? How 
might it be improved? ' V 

PoWcy analysis focuses primarily on an existing or likely policy problem, its structure, seeming^ 
causes, possible policy responses and a comparison of alternative responses in terms of their 
^estimated costs and effects. Though a policy analysis mSy include consideration of an existing 
program, attention is not limited to any existing program alternative. 

Needs assessment represents an attempt to identify and assess the types and extent of per- 
ceived, reported or inferred "need?" in a defined population group. Existing programs are relevant to 
this approach in attempts to identify what are perceived to be gsps in existing services. * 

It is clear that these four approaches overlap (see Kimmel, 1977, and Morrill and Francis, January ' 
1979). ■ - 
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' ^.Table 1 , . «• 

A Selected Comparison of Four Proposed '"'Rational Aids" To Decision 

- i 




POLICY ANALYSIS 


NEEDS ASSESSMENT 


MAJOR 

CONCEPTUAL 
SOURCES 


Economics 
Choice theory 
Decision theory 

i 


Eclectic (unclear): c 
Psychology 
Social work 
Survey research 


MAJOR 
CONCEPTS 

f 


i 

Systems view of a problem | Needs: 
Structure of a problem i Individual 
Causes of a problem ! Community 
Alternative responses to the problem Met . 
Costs and benefits- of alternatives Umet 
Criteria for choosing among Assessment: 
alternatives Estimating 
Constraints . , Valuing 

v | Judging » 

\ " • '• Gaps in service: 
\ Estimated needs juxtaposed to . 
\ existing services 

o . ■ 


DOMINANT 
;> F0C£L 
POINTS 

;i ._ 


Existing or expected policy problem Perceived or inferred needs of 
Specification of alternative responses (population groups: 
to the problem v j Community 
Comparison of alternatives • ! Population at risk 

Target population 
* ' Service population 

-/ ' ' 


INTENDED 

V 


-Analytical input to decision making , 
aj>outa problem 

Planning ■ 
Program design , . 
/Resource allocation 

( Rationalize decisionmaking 

\ 

' - \ 


i 

Planning 

Priority setting 
Resource allocation 
Rationalize dedsfon making 

t 


%ji inn 

DATA 
SOURCES 

♦ 


■ — \ — ■ 

Multiple and varied depending on the 
nature of the problem\ 
Emphasis on the use coexisting 
studies, Information ahtfcdata 

*'v. 

• 


Opinions of experts and groups 
Field surveys : % 
Social indicators , * 
Demographic indicators 
Epidemiological studies 
Incidence and prevalence studies 
Secondary data analysis 


" : ORGANIZATIONAL 
d ION 


Staff office serving decision makers 

! 


Rarely discussed /] 
Often performed outside tfie agency ^ 1 



Making 



PROGRAM PLANNING 



-PROGRAM EVALUATION 



Business management theory 
> Economics 
i Decision theory 
i Forecasting ' 
! Planning theory 



Social science field research: 

especially from psychology and sociology 
Some economics and engineering 
Statistics 



Resource constraints 
Budget costs 

Policy goals and objectives 
Program "al terjpatives 
Tradeoffs 

Input-output relationships 
Fj^^WTft^tal nty 
Mill ti -year Ibr^c as ts : 

Condi tfoar 

Costs 

Results ' 



Program goals and objectives 
Program outcome, impact and results 
Criteria of outcome and impact (or other 
change) 

Measures or indicators of the criteria 
j Comparisons of changes in the 
imeasuresror Indicators 

Comparisons with and without the 
fprogram 



The future: 

. Mix of old and new problems 

New goals and objectives 
. Estimated resource Availability 

Possible alternative courses of 
' program development / 

Estimated outputs , outcomes, 

impacts 



An existing operating program 
Change's which result from the 
program, especially among clients or 
problem conditions 
Judgments about program worth 
based on observed, measured and 
Inferred changes 
Program- performance , 



Development of multi-year plans 
Context for §urrent decisions about the 
use of future' resources 
Recommendations on current decisions 
Rationalize decision making 



Feedback on the 4sults of existing 

program t 

Improve program management 
Increase efficiency and effectiveness of 
programs / ■ 

Improve program design < 
Rationalize decision maklpg 



Multiple and varied 
Time series data 
Analytical studies 
Evaluation studies 



Multiple and varied depending on the 
indicators and measures selected 
Frequent emphasis oh new data 
collection 



Staff office serving deci 

ERLC 



makers 



Staff office serving decision makers 
Often performed outside the«agency 



Major Conceptual Sources 

The major conceptual or philosophical sources of much formal program evaluation appear to be 
the field research components of several social sciences, principally psychology and sociology. 
Some concepts and methods from economics and engineering (effectiveness, input-output relation- 
ships) are also employed. Many of the concepts for policy analysis come from microeconomics, 
decision theories and theories of choice. The sources of the concepts of needs assessment are 
unclear and eclectic, though they are probably derived from psychology and social work. Those of 
program planning are eclectic: business management theory, economics, forecasting and formal 
planning theory. 

Major Concepts 

The major concepts of program evaluation include an emphasis on program goils and oty'ec- * 
tives, measures or indicators which are to be derived from those goals and objectives, changes 
which occur due to the operation of the program, outcomes Qr impacts of the program reflected in an 
appraisal of those changes, and a set of/ucfff/nente-about the value or worth of the program. While a 
consideration of the costs of a given program or one of its elements may or rrtay not be part of an 
evaluation, the notion of resource constraints is central to policy analysis. In principal, both ap- 
proaches (evaluation and analysis) attempt to estimate "net benefits" of aprogram and both attempt 
to establish some notions about "cause and effect/ These considerations are normally absent, for 
example, from needs assessment approaches. 

Dominant Focal Points * 

The dominant focal points of program evaluation are judgments about the value or worth of a 
program, about probable causes and about results, based on measured changes which can be 
attributed to the program. Those of program planning are estimates of resource requirements and of 
the expected costs and results of a future mix of programs directed toward some tentatively 
established goals and objectives. Needs assessment focuses on unmet needs of a defined popula- 
tion group. Policy analysis focuses on existing or expected policy problems, and on an explicit . 
comparison of alternative responses to those problems, whether or not a program already exists. 

Intended Uses " 

The-intended uses of all the approaches. are expressed in claims that they will "improve the 
rationality of decision making": about future plans .and priorities in the case of program planning; 
about major program, resource and rtianagement decisions in the cas$ of program evaluation; about 
decisions on policy issues in the case of policy analysis; and about future plans to fill unmet needs in 
the case of needs assessment. 

- • 

Major Data Sources i 

All the approaches appear to require a mixture of existing and new data. The approach of policy 
• analysis usually includes an injunction to use existing data, studies and analysis creatively. The 
approach of needs assessment emphasizes new data collection; often through the use of surveys. 
Program evaluation focuses on the measurement of outcomes and impacts. Because existing data 
for thibse purposes are often scarce, the approach usually requires new field data collection, 
sometimes of an extensive variety. . 

Organizational Location 

The prescriptions for program planning, program evaluation and policy analysis recommend that 
these activities be located in a staff office serving key decision makers. Needs assessments and 
program evaluation studies are, however, usually conducted by groups outside the agency. 
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Relationships. Among the Approaches 

Writers on program evaluation, like trtose on other forrflalized approaches which are urged on 
public agencies, often comment on the imperative or "logical" role of the formal approachin the affairs 
of an organization. Though represented in, many alternative ways, a common diagram .depicts 
several proposed management functions in a cyclical relationship something like this 

Program Planning 

^ ■ ■ 

> Pi^gram Evaluation ■ Program Development (Design) 



Program * - Budgeting 



Implementation 



Here the approaches to decision making are presented in a closed loop of interdependent activities 
In this cyclical sequence, one function leads directly to another. Plans are first developed, then 
programs are designed Approved programs are then funded through budgeting and then im- 
plemented Once operating, a program is ( evaluated and performance information is fed back to 
decision makers for use in yet another cycle of functions. • x 



This representation is, obviously, an ideal model. It is derived not from empirical observations of or 
operations of real organizations but from an idealized/ technical style of thinking based on a variety of 
"rationality" (there are many varieties) which has 1ftis step-wise form: 

First, identify goals and objectives (Plan). 

Second, specify alternatives to reach these objectives (Programs). 

Third, compare-the alternatives (Analysis). * , 

Fourth, select the best one (Choice). 

Fifth, fund the chosen alternative (Budget). 

Sixth, set the program into operation (Implementation). 

Sevlnth, assess the program in terms df its results (Evaluation). 

Eighth, repeat the cycle. - . 

While it has the appeal of simplicity, this rational, sequential model is rarely, if ever, followed in actual 
public agency practice. There are several reasons. Many of them spfftog from tfre basic nature and 
role of apublic agency in a changing environment of social and political interaction.(Lindblom, 1965, 
Pressman and Wildavsky, 1973)' , ^ ^ 1 . ° 

What Methods Are Proposed for Program 
Evaluation? 

When earmarked funding for evaluation purposes first began in the Department of Health 
Educaton, and Welfare, for example, it was assumed by many that the approaches and methods 
which would be employed in program evaluation studies would range in type ^nd variety depending 
on the nature of fhe program to be evaluated. Case studies, Surveys, field interviews, self-evaluqtfon 
and informed observation and analysis had long-standing use. Critics argued that many of these 
techniques were weak tools for evaluation because they did not ensure adequate "objectivity, ,fc were 
sometimes used in an "unsystematic" way and did not atwaysprovide "reliable" data on the basis-of 
which the "causes' 1 (of, say, a change in student learning progress) could be established 

I ' ' * 
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Other formal methods were also already in use irt some government settings Cost-benefit 
analysis, for example, had been applied in a number of areas including the comparison of alternative 
disease control programs. Methods of cost-effectiveness and systems analysis were widely dis- 
cussed and applied with varying degrees' of success. As we noted earlier, however, the application of 
these techniques frequently relied on the use of quantitative data, which often did not exisfc Critics of 
these methods also claimed that they placed excessive emphasis on the economic aspects of 
programs and not enough on changes ikthe program client (social and psychological), on the 
impacts which a program might have on an institutron (a school or hospital) or on the wider community 
(mental health or the frequency of delinquency). 

Social scientists, especially psychologists and sociologists who claimed a tradition of research, 
argued that the methods of field evaluation should corhe from the social sciences. There arose rather 
quickly what has since become a long and continuing discussion, debate and argument among 
interested parties over what methods, and approaches were the more appropriate, reliable and 
preferred for prograrfi evaluation purposes. Ir^his early and widely read book— Campbell ( 1 979) calls, 
it "the founding book"— Evaluative Research, for example, Su *iman (1 967) stated that he wanted to 
retain the term 'evaluation in its most common sense usage as referring to,t(ie general process of 
assessment or appraisal of value, (p. 7) When he turned to his main subject, however, he introduced 
this special* condition: 

Thus from the beginning we would like to make it clear that we do not view the field of evaluation as 
having any methodology different from the scientific method, evaluation research is, first and foremost, 
research and as such must adhere as closely as possible to currently accepted standards of research 
methodology ultimately the significance of the results must be determined according to the same * 
scientific standards used to judge nOnevaluative research (p 12) 

Four chapters later, however, Suchman admitted: \ V , * 

Examples of evaluative research which Satisfy even the most elementary tenets of ihe scientific * 
method are few and far between (p 74) % ' 

1 ' v ■ f • , " 

In 1970 in another early, influential and often-referenced book on evaluation, .Wholey and his 

associates (1976) speak of •'formal, organized evaluation" this way: 

In this sense evaluation is research, theapplication of ihe scientific method to experience with public 
program^ to learrf what happens as a result of program activities (p 19) ' £ 

Caro (revised edition 1*977), another widely read author, opened his edited collection of writings on ' 
evaluation, in the early seventies, with a general definition and then added his own emphasis on 
, formal research' " » - - *i 

Program evaluation has two essential dimensions, one -concerned with judgment and the other with 
information Programs are conducted to a'chieve a goal, end. or outcome tl\at is valued Program 
v evaldation produces judgments regarding the degree fo which desjred outcomes have been achieved 
\ or can be achieved It leads td conclusions regarding the^worttTof organized efforts Information is of 
critical importance in the evaluation process Performance* asknpwn through verifiable procedures is 
related or contrasted to 'goals The method through which such information is obtained is often a 
central point in evaluation, (p 3) ' 

Caro then mentions several alternative methods of evaluation, thosfe we use in everyday life; 
accreditation, for example, through licensing, anfa cost analysis. He, like Suchman, then opts for 
evaluation research. ? 

Evaluation research may be considered a third tradition that is 'distinguished by its central concern for 
outcomes of treatment It attempts to determine whether charjges sought throughyan intervention 
actually come about Further, evaluation research is concerned with the question of whether observed 
changes can reasonably be attributed to the intervention Evaluation research, therefore, makes use 
not only of scientific method but procedures designed to test for causal connections It is the evaluation 
research^pproach with which this volume Is primarily concerned, (p 5) ^ 

Suchman, Caro and other social scientists (turned ev^tua(tors) insist that the most reliable and 
appropriate; methods of program evaluaton^are the methodaof social science. In its most extreme 
t formulation) this view proposes that thepreferred method of evaluation is "experimental" or ''quasi- 
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expenmental m (Campbell and Stanley, 1 966, Cook and Campbell, 1 975) Only these methods, some 
believe, are scientific and reliable and will yield valid, objective and imperial results 

A wide range of limitations and difficulties with these methods, however, has been experienced in 
actual program evaluation Settings. Theif practical payoff ha& not matched their early optimistic 
promise. Consequently, a reconsideration ofthore reasonable, feasible and satisfactory methods of 
program evaluation has been occurring 

Scnven (1 976), for example, underscores that evaluators "mustf requently face the need to do the 
best we can with nop-experi mental u.ta" He sketches the modus operandi (MO) method of 
identifying probable cause. This method is familiar in the approaches of the detective, coroner, 
cljmcian, historian and anthropologist whc «r ploy ' causal checklists" and "pattern recognition" to 
establish probable causfe It also appears to be the method ysed by the mechanic, doctor, consultant, 
diagnostician, specialist or trouble-shooter in almost any figttf It is likely a major approach used (if 
Jonly implicitly) by the successful program mai u ger and decision maker 1t is seemingly commonplace 
also asa general method used by the experienced evaluator especially at the point where the powers- 
of formal methods end (and they all do) and probable cause must be inferred from a variety of partial, 
provisional, complex and incomplete information Scnven suggests that "the main thrust of efforts 
toward sophistication |in evaluation) should now turn from the quasi-experimental toward the modus 
operandi approach " (p 108) 

Recently. Alkm, Daillak and White (1979) also expressed a view quite different from those of 
Suchman and Caro Wondering why there appeared to be so many "wasted" evaluations, they open 
their examination of five^ases of evaluation utilization this way: * 

Why should we even^be concerned with this question? IwastefThe answer is to be found in the 
fundamental distinction between evaluation and research One of the authors of this book, along 
^ with others in the field of evaluation, has felt that the clear understanding of /he distmction between 
these two kinds of studies is essential to the development of evaluation Theory and ultimately to tne 
practice of evaluation t Alkm, 1 973) 'On theone hand, there are studies designed primarily to add to the 
, body of knowledge (research), on the other, those studies designed pnmarriy to provide information for 

« decision-making (evaluation) And these two functions are separate and glstinct The followwig typical 
comment provides a case in point The study was appropriate even if the results were not utilized since 
its redeeming feature is its intnnsic value arrd its contribution to the corpus of knowledge Such a 
statement ts appropriate as a comment on research but not on evaluation (pp 13-14) 



Finally, some evaluation experts appear to striken neutral ground with respect to evaluation 
method. In a recent discussion of the relationship between zero-bas^d budgeting and evaluation, for 
example, Wholey (1978) emphasizes systematic measurement of program performance, but here 
he does not specify methods: 

In this book vws use the term program evaluation to mean the systematic measurement of program 
performance (resource inputs, program activities undertaken, resulting outcomes or impacts), the 
making of comparisons based on these measurements, and the communication of evaluation findings 
(measurements and comparisons) for use by policymakers and managers in decisions on government 
programs (P^47) 

There are, in short, divergent opihionS held by reputable parties about what types of methods (and 
what canons of proof and evidence), are appropriate to and adequate for program evaluation 
purposes. These divergencies are explained partly by the existence of different schools of opinion 
about what constitutes reliable information and knowledge what constitutes evidence of "cause" and 
what is the nature of the public agency evaluation problem in the first place. * . 

/ 

Table 2 identifies broad groupings of approaches which are available for program evaluation The 
groups of methodsdearly overlap and are used im/arying combination. For example, all the methods 
are dependent on and are used in combination with ordinary intelligent observation and analysis (II) 
Many employ the modus operandi method (IV). The results of all public agency program evaluation 
must be us$d through interactive social processes (I). (Lindblom and Cohen, 1979) 

Since the problems to which different programs are directed and the conditions under which they 
operate vary widely from instance to instance, there is no way to prescribe in advance which category 
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Table 2 

t Some General Approaches and 

Methods Used In Program Evaluation 

i 



I Evaluation Through Interactive Social Processes: e.g., 

/ Political Processes of Bargaining and Adjustment; 
Market Processes; 
Mixed Processes; and 
" ~-Ofher Social Processes. 

II Ordinary Intelligent Observation and Analysis. 

4 

III Conventional Methods of Investigation- e.g.. 

Observation; 
Fact Gathering;' 
Historical Analysis; 
Contextual Analysis, 
Data Synthesis and Analysis, 
Inferential Reasoning; and 
Guesstimating.': 1 



IV MO" (Modus Operandi) Method: e g., 

Use of Implicit or Explicit Causal Checklists and Pattern Recognition 
.by Clinicians, Coroners, Detectives, Troubleshooters, and Others 

V Systematic Analysis: eg., 

Cost-Benefit Analysis; ^ 

Cpst- Effectiveness Analysis; , 

* „ Systems Analysis; and 
Policy '(Program) Analysis. 

VI Formal Social Science ^Approaches, e.g , 

Ex Post Facto Design; . ^- 
Pretest, Posttest [Resign; 
Quasi-Expenments, 
Controlled Experiments; and 
Others. » 

VII. Other Methods (Recognized by the Reader). " 



VIII. Mixed Methods (Combinations of.the Above). 



of methods would be superior to any other. A general pragmatic rule is fit the method (todl, approach 
or process) to the problem* And not vice-versa. There are at le^St two steps which can be taken prior 
to a decision to evaluate a program formally, (a) preparation of a brief program evaluation issue 
(problem) paper, or (b) conduct of an ' evaluability assessment.'' Either of these pre evaluation Steps 
will assist in determining which, if any, evaluation methods and approaches might be most suitable in 
a specific situation. The steps are discussed in part VI. We turn next to a selected set of basic issues 
raised by formal program evaluation when applied in practice. 

.i • 
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. Selected Issues Raised by Formal 
Program Evaluation 



Attempts in'lhe last 20 years, of which I am aware, -te introduce formalized technical aids to 
decisipn making (e.g., program budgeting, policy analysis, planning-programming-budgeting (PPB), 
mana§emertt-by-objectives (MBO^ and perfcymance monitoring) into the complex, interactive and 
' political environment of a public rtunicfh service agency have raised a variety of issues, questions and 
dilemmas. Formal program evaluafert js no exception as the .literature vshows. (Knezo, 1974; 
Chelimsky, 197?; Patton, 1S78) Many questions relate to conceptual, technical, or methodological 
issues of the formal approaches to evaluation. While these issues are important, there is another 
Class of issues which seems basig and persistent; nam&y, the general problem of "fit" (or misfit) 
betweert Ihe assumptions and requirements of traditional formal program evaluation arid thp proc- 
esses of political economy which make up the normal environment of public hu man service agencies ^ 
This part discusses a small selected set of these i|sue^r¥xpectations; the so-called criteria and 
indicators (measures) problems, "causation" as a prescribed focus of one branch of evaluation; the 
consequences o? adapting general program designs to local circumstances; the degree of control 
which public agencies e^eitover programs and problems which may be evaluated; and thfrsense in 
which program evaluation is "research" and/or "science." The discussion begins with our expecta- 
tions. * 4 . , 

The Role of Expectations • 

Our expectations heavikrcolor our judgments of the results Df what we do, i.e., our evaluations. If 
expectations are extremely high, we may view modest results with disappointment as shortfall or 
failure. By contrast, if expectations are very low, it may not take much to satisfy them. The same 
modest results may now look fetter — like progress or success. This basic psychological relationship 
between what we look forward to '(expectation) and what we get (results) lies at the base of both 
individual and collective evaluations. * 

This phenomenon is afcin to the differences in perception 6y which one may see the same glass of 
water as either half full or half empty. The same phenomenoft occurs when an evaluatioryesult of 25, 
percent is viewed as either a little or a lot. (Weiss, 1973) In all cases, results depicted by a formal 
evaluation will be measured against implicit or explicit expectetfonsjjpbout anticipated results. 

At the national level, tfoe perceived shortfall or failure of many of the programs of the era of the 
"Great Society" can be attributed in part t6 what can be seen in 20-20 hindsight as high, if not 
unrealistic, Expectations abouJt what was both possible and probable. Viewed from a contrasting^and 
optimistic vantage point, Wattenberg (1978) examined a wide array of evtdenca on cflanges in 
individual, Social and economic conditions from 1960 lo 1976 andJQund ". .. that;-behind the 
harum-scarum headlines a great deal of remarkable progress has occurred in the United States in 
recent years." (p. xi) ; ♦ 

. At the level of individual human service programs, it matfdrs greatly whether evaluators, prqgram 
managers or other influential who participate in decisions on funding, program design and man- 
agement expect large results, small ones or none at all. Whether our judgments are that programs 
work or do not, that they pay off jor not, or that they £re worthwhile or worthless is in no small part a 
direct function of our expectations about them. - 
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s Similarly, when we put dbr management tools and approaches to tests of worth (for example, MBO, 
polfcy analysis, PPB or program evaluation), we judge them better or worse largely as a byproduct of 
the types and levels of- expectations we have about what good they will do p the first place. The 
% history of the "success " or 'failure' of pr6gram evaluation is itself written agai nst the backdrop of what 
we hoped, were told, were sold or otherwise came to expect would be ifs value. It is clear, then, that 
we cap adjust valuations of worthiness of programs not only by changing them or by changing the 
measuring tools by which they ere gauged, but alsQ by adjusting expectations about them, * 

Values in Evaluation ^ 

\\ is 'sometimes assumed that formal evaluation will substantially reduce if not eliminate the 
intruSlqn of values into decisions about programs. Because it will presumably be based on the 
"impartial" generation o^ verified "facts" and conclusions through the application of 'reliable" 
methods, formal evaluation is sometimes viewed as relatively v^lue-free qr value- neural. In practice, 
however, formal evaluation is, like other modes of research and analysis directed^ alpubkc policies 
and programs, not value-free but value-embedded. Were are several of the many ^yays thaf values 
enter, directly and indirectly, into all formal program evaluation processes. * ' ' - 

1. Selection of the Program To Be Evaluated 

Time, resources, interest and common serfse ensure that'a public agency does not usually and 
fqrmally evaluate all of its programs at once. A selection of one or a few from among «iany is 
necessary. The motives for an evaluation may be several and usually mixecj. to comply with an 
-j^ternal mandate from an authorizing or fundmg source, to verify problems which seem" to exist (low 
morale, drops in productivity, excessive processing times, uousual. costs, poor targeting, etc.), to 
respond to an outside charge about program performance, to inquire into how the program actually 
works, and so oh. Whether there is a single motive or several, a decision to evaluate one program 
rather than others is ao act of selection. It focuses attention by subjecting one program to formal 
scrutiny whileuspanhg others. The 'resulting attention may change the image, aura, competitive 
position or other conditions of the subject program compared with others. Both the motives which 
lead to the evaluatidh and the act of selection itseJf are avenues along which values enter early the 
formal evaluation process. The selection of one program for evaluation over others has been called 
n a political act , 

2. Choice of the Evaluator(s) * « 

For all the reasons that individuals in any craft or profession vary from one to another (skills, 
experience, competence, motivation, social philosophy, etc.), so do evaluators. The selectfon proc- 
ess by whicb evaluators are chosen, vyhether informal and simple or formal and complex,j^ll, by 
mtentitfn, screen some potential evaluators (and their likely evaluation approaches) in and oth ers out. 
Since evaluators are_not interchangeable, some additional measure of variable value orientation will 
enter the formal evaluation process at thiS stage. 

f 3. Negotiations Between the Evaluator(s) and the dient(s) 

However focused aainitial evaluation proposal, plan or design may be, negotiations between client 
(sponsor) and evaluat&r are common and essential. In these negotiations, emphases, prifrities, 
measures, approaches, understandings, etc., will be further shaped. f f 

4. Conduct of the Evaluation 

Few evaluation studies go precisely according to plan and design. Unpredictable field conditions 
and barriers, unanticipated staff turnover, data shortfalls, misestimation of logistical ancUime re- 
quirements, changes in the sponsor's mind and so on are the common challenges to evaluation 
management. Substituting proxy measures for intended ones, modifying a planned sampling 

> « 
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method, trimming and redesign will all contribute to further shifts in scope, empbasis and approach 
Though some accepted guides and technical adjustments to field stucly exist as standard procedure 
to deal with so-called "threats to internal and external validity," the path of field work is rarely smooth 
or according to plan. It is normally bumpy and strewn with compromises. 

*♦ 

5. Analysis of Results 

It is a common misconception that "data speak for themselves.;; Yet, like two witnesses whofeport 
the same accident in different ways, two evaluators may interpret the same findings and datarfn 
dissimilar ways. Members of the same evaluation team often come to different interpretations of the 
same data. They may thrash out differences and compromise for the sake of<a show of consensus 
and unity in the final report. Some evaluation experts suggest that this practice robs the user of 
legitimate alternative (divergent) interpretations and possible insights which may prove valuable if 
not decisive. They urge the open submission of minority reports as a routineipart of the presentation 
of study findings. Re-analysis by outside analysts of the data from a completed evaluation may '(and 
Often does) turn up new interpretations and conclusions' 

6. Inferences From Findings to Recommendations * ^ 

Discovering what is tells us little or nothing about what ought to be. Whethef the findings of an 
evaluation sfrdy are descriptive or explanatory, there are no ready-made rules for moving from 
descriptive statements of "fact" to prescriptive statements about what ought to be done in the future 
Since program evaluation is carried out in a value-diverse environment of competing claims for public 
and social resources, the formal evaluator must invoke assumptions, chains of reasoning, sup- 
plementary knowledge, theory and social philosophy to move from findings to recommendations 
about progranrythange. The strict researcher may be inhibited by professional norms from getting 
'too far beyond the data." Yet public agency users are often interested in moving well beyond the data 
to guidance about "What should we do now (next)?" Depending on its length, the inferentialleap from f 
so-called fact or finding to prescriptive action may toaverse a lot of value territory 

7. Use of Study Findings inWolicy Debates 

If and when evaluation study findings are invoked in policy discussions, the net of interpreters is 
enlarged , usually well beyond the original client and evaluator(s). New actors usually bring somewhat 
^different perspectives, assumptions, chains of reasoning, experience-based knowledge, incentives 
and social philosophy to the interpretation of study results. Since no study ever covers the waterf ront 
of a program or presents tirtdings and conclusions with equal clarity, evidence and certitude, the 
terrain of possible variable interpretation is substantially enlarged at this *step in the program 
• evaluation prpcess. , 

There are, in short, not a few but many complicated, blatant and subtle ways that values enter even 
*the most technicall^and managerially scrupulous program evaluation process. This is not a problem 
uniquelo formal program evaluation. St§ps can betaken, for example, to keep major v^kie shifts and 
drifts as explicit as possible, avoid gross instances of willfull bias and subject tesdlting work to 
scrutiny from many points of view. Despite these efforts, program evaluation will remain value- 
embedded rather than value-free or Value-neutral. [Note: For one detailed attempt to cope self- 
consciously withTsome of the value issi^s in practice, see the history of the attempt by the National • 
institute of Education to evaluate the ESEA Title I prograrn (compensatory education) fOr th^ 
^Congress (Pincus, 1980). 1 



js far we have brushed past a major step in formal<evaluation where value issues and technical 
issues^are frontally or subtly joined/ This occurs in the inevitable selection of the explicit criteria, 
indicator^ and fneasures in terms of which prograifi performance* and impact will be formally 
examined. 
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The Criteria and Indicators Problem 



Traditional formal evaluation usu^Hy requires explicit measurement Every attempt to gauge the 
effects or impacts of a program fortpally requires the selection and specification^ some criteria 
(e.gi, income, achievement, health status, etc.) in terms of which the 'output, outcomes, impacts, 
effects or results are to be examined. But evaluation methodology is silent about what is to be 
measured. For example, how should the effects be gauged of an educational program ostensibly* 
, intended to improve student learning? What criteria shoyld be invoked? Th& answer to these 
questions turns party on your point of view. When economists attempt to answer this question, they 
oftervturn naturally to those criteria about which they know nriost— economic criteria. They may try to 
.estimate the change in future earnings attributable to, say,, a program intendefd to reduce the high 
school dropout rate. By contrast, th§ bulk of the attention of "cognitive" educational evaluators has 
historically been paid to the effects of programs and practices *on student school performaope 
measured largely iir terms of grades and achievement test scores. By further contrast, a general 
psychologist might look at thfe effects of the same projjtam Jn terms of itsinfluence on "nGhcognitive" 
factors such as a student's sense of self-esteem, sense of control of his/her environment, br attitudes 
toward risk-taking and unpertainty. Others may look to effects qf a prograrrvor practice on personality 
"traits' 1 suchjas assertiveness, persistence, sense of responsibility, and self-contrdl. Still others may 
look at program consequences for general social skills reqtflred for interpersonal adjustment, coping 
with change and stress management. • * 

Once criteria have been chosen, for whatever rfeasons, specific measures or indicators must be 
selected to reflect these criteria. If student "achievement" is to be measured, for example, which of 
the many existing achievement tests should be employed? The formal evakiator may focus on the 
important issues of technical validity and reliability of a given test. But selecting tests has not only 
technical aspects but value dimensions ete well. 

Although educational research has been going on for decades, there is no consensukabout what 
should be measured or by wftipi tests. So-called standardized achievement tests are deeply rootpd 
in the educational system. But some critics of standardized tests raise basic questions about the 
extent to which they actually measure learning ability and accomplishment rather than the acquisition 
of knowledge about the contentpf the dominant culture. Most experts agree that culture-free tests are 
impossible (since education is ijpelf a part of culture). Some do, however, propose tests that appear in 
their view more culture-fa/r. Debate does not then + $top, but normally shifts to what is "fair" and what is 
"foul." The many debates aboiat tests (their assumptions, their methodology^and the social conse- 
quences of their use) reflect in turn larger issues about (a) what purposes education does or should 
serve in the society at large and (b) what influences educational philosophies and practices have on 
the attitudes, values, skills, competencies and futures of individuals in the educational sy%tfcm. Many 
an # d diverse economic, social, cultural and ideological vipws of education have come vividly to the 
4orejn policy and social debate in the last 20 years. They have been heated and vifulent. 

The point is clear: there are alternative and competing criteria and measures in terms of which the 
impact on students of an educational program can be measured. Reflecting on the more general role 
of education in society, Jerome Bruner, noted education expert, underscores the culturally embedded 
and value-laden nature of the study of child rearing and human development. His remarks amplify 
and set into a more encompassing framework the comments just made about the education criteria 
ancj measurement problems: 

I would urge that in the nurturing of the young, a society is required to make a continual series of 
decisions aboutits norms. Child rearing is neither a private activity nor is it "factual" nor dispassionate. 
Since human development is as much determined from the outside in -as from the inside out, its 
guidance is as much a prerogative of the culture, as it is a reflection of the intrinsic growth of the 

nervous system It is in consequence of this position that the study of human development is so 

implicitly guided by policy needs: how to raise or even define an intelligent human being * how to assure 
the growth of a proper moral judgement or an adequately evolved logical capability^ ow to increase 
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* independence or loyalty or tenderness, how to prevent alienation and anonymity in a technological 
order or to^maintain identity in the face of urban mobility. . . 3o long a^ the culture a^anges the 
conditionsunder which growth is supposed best to occur, the study of human development must in 
*some inevitable way be a normative science, a policy science like economics. The^shape of the 
scienqes of human development, either in the past or in the futurej WWI to some considerable extent be 
- ~? a result of subtle (and sometimes not so subtle) forces imposed upon it by the culture in which these 
" sciences exist (Bruner, with the author's permission ) 

(Note. Neither the author of this monograph nor the author of this quote was able to relocate the exact % . 
source. Similar ideas-can be found m Bruner (1971) ] ' . 

. In sum, the evaluation measurement problem* equires the selection of (decisions about) measures 
and indicators which have analytic utility, that is, which presumably enhance understanding of the 
phenomena they are designed to measure. The problem also raises value questions about which 
aspects of individual and social development on which programs impact are worth measuring in the 
first place. In progcam snd policy studies, analysis and evaluation, the so-called criteFionpr° b,em and 
its allied measurement problem are at their heart also value problems, In practice the>choice of 
measures may be made in some inductive and pragmatic way, the analytic and value questions are 
"decided" simultaneously. • , 

The existence of stated program goals and objectives may limit the range of chpice among criteria 
and measures. But program goals and objectives are often aspifational and Jhe result of political 
compromise. As a result, they tend to be general, abstract, multiple, often conflicting and evanescent 
Those who initiate and conduct evaluations are ordinarily forced to choose, if not invent, alternative 
criteria and measures f.or evaluation purposes. While there are no universal operational guides to 
these choices, here are 'five general rules of thumb which appear useful. 



1 . Employ measures which are of expressed interest to the evaluation sponsor(s) or expected 
users. 

2. Employ multiple measures, when feasible, rather than just one. 

. \ 3 Acknowledge' the value .implications of selecting criteria and measures and make value 
7i decisions openly and consciously. 

y * 

^ 4. Consciously.JS&ect measures from more than one value set. In the education example, this 
may imply choosing not only from among "cognitive" measures of performance (such as 
grades and achievement scores) but also from among "non-cognitive" measures as well 
(such as measures of self-esteem, sense of "internal-exterrtal" control, etc.). 

It is worth noting that while cognitive and non-cognitive measures refer to different sets of 
factors which may be associated with learning progress, their mutual interactions and 
relative contributions to learning are still basic open questions. 

5. Keep intangibles in the foreground and not the background of the analysis/Many aspects of 
programs may be^tidged important and yet not be susceptible to measurement Meqsura- 
biltty is in no way an index of importance. Tellingly, Campbell, a major exponent of 
experimental and quasi-experimental evaluation research, carries the point further- 
Too often quantitative social scientists, under the influence of missionaries frqm logical pos- 
itivism, presume that in true science, quantitative knowing replaces qualitative, common- 
sense knowing. The situation is in fact quite different Rather science depends on qualitative, 
* common-sense knowing even though at best it goes beyond it Sconce in the end contradicts 
some items of common sense, but it only does so by trusting the great bulk of the rest of 
common-sense knowledge Such revision of common sense by sciences akin to the revision of 
common sense by common sense which, paradoxically, can only be done by trusting more 
common sense (Campbell. 1979. p. 70) 
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Evaluation and "Causation" 

i 

The branch of evaluation which flows directly from the formal research components pf some social 
sciences carries with it an aspiration to establish the "causes ' of program effects. While the search for 
causes of events appears to be a central concern of some social science research, it seems beyond 
the power and ken of applied program evaluation. The reasons are several. First, establishing 
"cassation" in a meaningful way is a complex problem which is itself a subject. of study by 
methodologists, epistemologists, and philosophers of science. Second, the social sciences re- 
portedly have great difficulty establishing reliable and verified causal relationships even in the case of 
the mqre constrained problems studied in "the laboratory." (Campbell, 1979; Almond and Genco, 
1977) Establishing formal "causes" in the open, contingent, evolving and highly interactive world in 
which real programs operate is a much more strenuous and complex task. Third, well short of 
establishing the ' cause-effect" of specific individual programs, there are major and unsettled ques- 
tions of causation associated with nearly all the basic human, social, economic and cultural problems 
that are the targets of human services programs in the first place, incentives for human learning, 
mental illness, alcoholism, delinquency, work incentives, ill health and the like. 

For example, some health programs encourage and or pay for visits to the doctor. While doctors 
play an important role in circumstances such as acute illness and medical emergency, medical 
services appear to account overall for substantially less than 1 0 percent of the variation in the health Y 
status of the population. Acknowledging the important role of genetics (Luna, 1973), most experts 
tend to agree with Knowles (f§77) that "The health of human beings is determined by their behavior, 
their food, and the nature of their environment." (p. 57) We do very much more to determine our own 
good or ill health than medical services do fdr us.^According to Knowles, "Prevention of disease 
means forsaking the bad habits which many people enjoy — overeating, too much drinking, taking 
pills, staying up at night, engaging in promiscuous sex, driving top fast, and Smoking cigarettes. . . 
(p. 59) We seem intent on making ourselves sick. 

In a similar vein, Lewis Thomas, president of the Memorial Sloan Kettering Cancer Center in New 
York City, reviewed the "science and technology of medicine" and concluded that despite progress. 

We are left with approximately the same roster of common major diseases which confronted the 
country in 1950, and although we have accumulated a formidable body of information abouj some of 
them m the intervening time, the accumulation is not yet sufficient to permit either the prevention or the 
outright cure of any of them (In Knowles, p 37) 

Health promotion and disease prevention appear to lie not dominantly in medicine but much more 
heavily in W relationship with our environment and in the things we do to and for ourselves. 
(Eckholm, 1977; Sobel, 1979; Dubos and Escande, 1979) 

i 

A fourth factor which inhibks the evaluator from reliably attributing "causes" directly to program 
effects is the multiplicity of other programs which may be operating in the environment of th£ program 
under evaluation. The interaction effects among "the several programs may make separating the 
effects of any single on§ of them impossible. Gorham, president of the Urban Institute, and Glazer, an 
associate, point to this problem in an examination of the "urban predicament." (1976) Based on their 
review of poverty programs of the late»sixties and early seventies, they concluded that "some of the 
changes that reduced poverty from 1 9.3 percent in 1 964 to 1 2.8 percent in 1 968 could be traced in 
part to the poverty prograhn." But they also pointed to a problem endemic to imputing "cause" to a + 
specific program intervention 4 which operates in & complex environment: 

The overall evaluation of poverty programs and model cities will probably always be m'dispute. 
Reliable%valuation is hindered by the fact that the poverty programs were only a few of a large oumber o 
of factors affecting mcome-and "opportunity." Of the other forces that were increasing income, perhaps" 
the most important was tbe very high economic growth the United States was expenencing during the, 
. late 1 980's Equalapportunity was given a great boost by the^assage of new civil rights legislation in 
1 964 which banWWWscnmination in employment. Sorting out the contribution of each of these factors 
is, at least for the present, beyond our most perceptive evaluators (p 11) j 

\ 
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A fifth and final factor (mentioned here) which muddies "causal" waters is the extent to vs^ch any 
given program design takes on highly diversified and variable configurationsfrom setting to se^ng as 
general elements of design are fitted tol^cal circumstances. The phenomenon of local adaptations 
deserves further comment. , v 

Local Program Adaptations • 

In terms of the specificity of their designs, public human service programs range from ttie general 
(block-type and formula grant) to the highly specific (such as programs tpr remedial reading). The 
design of many others fall! somewhere in between. The programs reflect a basic, though often 
general, structure, rfl&y include prescribed elements (and other design features), and are often 
directed to a target problem such as drug abuse and/or a target group such as elementary sfchool 
children. Examples include community mental health centers, neighborhood health centers, com- 
pensatory education programs and a wide range of other State and locally operated human service 
programs: 

These model" designs are often based on the assumption that they can be implemented (much as 
they are) in wide-ranging local contexts. But few if any programs fill jn the fine details of structure, 
scope, level, intensity, pattern of management, program configuration, staffing arrangements and the 
like — all of which and more are*required to turn skeletal designs into viable, operating service 
programs. They do not, because they cannot, supply a detailed operationalized recipe for effective 
implementation. 

Program designs must be fitted, tailored and adapted to suit specific, concrete local circumstances 
or they will fail. As a consequence, formal evaluators who have looked closely have found^some- 
. times to their dismay, that the variability from site to site of program operations is large if not 
enormous. In a recent attempt to identify the types of and preconditions for "coordinated planning" 
between health and mental health planning in nine States, for example, evaluators reported that the 
diversity and number of conditions and barriers was "almost overwhelming." (Hagqdorn, 
1980) Similarly, formal evaluatiqn approaches which employ statistical measured to summarize 
across large numbers of projects fail to capture the important individual character and variation^- 
local program adaptations. Instead, they often successfully mask, rather than disclose.what seem I 
be among the basic determinants of program performance. * 

McLaughlin (1980), for example, appears to have come to this^conclusion after participating in a 
Rand study of several education programs: ESEA Innovative Projects, ESEA Bilingual Projects, 
Vocational Education Exemplary Projects and the Right-To-Read Program. The Rand team spent 2 
years examining local projects under, the four programs and 2 additional years following those 
projects under the two largest. Team member McLaughlin concluded that conventional formal 
methods of evaluation do not work. She argues that many evaluation approaches merely assume a 
"black box" between project or program inputs and their outputs or effects. She concluded: "The 
contents of the black box, ii turns out, matter more to project outcomes than do other factors that 
evaluators attempt to calibrate and assess." (p. 42) She identified "a number of factors that are 
generally ignored in special project evaluations, but that are required for a valid evaluation design." 
They are factors not of program design but of the interaction of elements of a local context or program 
setting: 9 

1 . Institutional support and rebeptivitv which included administrators* attitudes and support 
of the project and a broad-based implementation strategy which involved all significant 
actors and resulted in staff commitment*; 

2. The baseline capacity and expertise which local staffs possess at the start of a project and 
* which m^y vary widely; and 
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,3 Available local implementation choices about how the project will be put into practice. 
Successful "choices, 0 in these cases* seemed to include training tied directly to the 
concrete f>FQblems, expressed needs and suggestions of participants; concentration in a 
few sites (rather than a shotgun approach) and a "critical mass" of supporting participants, 
locally prepared rather than imported project materials; and routine staff involvement in 
decision making about project approaches and materials, (pp. 43-44) 

The preconditions of participation, timely feedback (which allows error correction), involvement 
and commitment made implementation "heuristic— a process of learning and adjusting, rather than a 
process of installation." (p. 44) Other local conditions and events unrelated to project design (for 
example, institutional climafe,, leadership style, cutbacks, teacher strikes) make up the natural 
setting withm which programs and projects are implemented. They heavily influence whether a 
project makes a differencg^^Laughlin concludes: . 

Yet these local factors are seldorrf in project evaluation models, A special project cannot be validly 
assessed in isolation from its system context, (p <$5) 

A direct corollary of the adaptations which must be made to make programs work satisfactorily is ' 
their changing, evolutionary and 'developmental nature. Programs have natural histories. Fledgling 
programs and projects operate differently from mature ones, though change may be regressive as 
well as progressive A continuing series of adaptations to changing program and project circum- 
stances leads to a regular succession of different progr^n Qperations and configurations. [Note. See 
White (1977) for an excellent brief account of the impact of changing context on an initial attempt— 
which apparently failed — to install a performance (evaluation) monitoring system in a large urban 
schobl system ] Impact, outcome and output-oriented studies usually miss the contextual and 
operational dynamics of programs and thus the circumstantial and adaptational features which help 
explain program performance. 

Limits on Public Agency Control 

It is commonly assumed in much of the writing on evaluation that if human service programs are 
formally evaluated and found to fall short of performance standards or expected results, public 
agencies can correct them In practice, a large share of the problems to which public programs are 
directed are complex and varied in cause. The factors which give rise to many of them lie well beyond 
the reach of government in general and beyond the influence and control of specialized agencies. 

> 

Similarly, public agencies are not autonomous agents free tp change thSTr programs and policies at 
will or on a moment's notice. They cerate instead within the familiar environment of social, economic 
and political pressure and commitments which arise not merely from citizen wishes but also forcefully 
from legislatures and councils, service providers, professional associations, commercial, financial 
and industrial beneficiaries, other levels of government, other agencies, ambient electoral, party and 
coalition politics *and so on. 

Some of the pressures which appear to be external are heavily articulated through the internal 
management and politics of the agency which itself consists of additional set^of stakeholders." The 
interplay of internal .and external pressures and the operating commitments §t an agency substan- 
tially circumscribe freedom to act either directly or decisively. Despite the site^tj* budget or the 
illusion of command and power, agency officials often hava influence over only a small (though 
sometimes important share of an agency's resources or of its overall operations. As more social 
functions have b^erfperformed by public agencies, the num ber and hold of "stakeholders" on agency 
resAjrce allocation and management decisions appear to have increased and reduced the discretion 
of agency leadership. 

[Note. We thank Steven J. Brams of the Department of Politics, N^w York University, for emphasizing 
the notion of "stakeholder."] ^ 
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In short, individual public agencies have significant but circumscribed control over their own 
internal operations and little, if any, over many of the problems they attempt to ameliorate or contain 
through their services. As a consequence, global evaluation studies directed pit society at large or at 
the' "root" causes of generic individual and social problems will probably yield less than we collec- 
tively now know and not much of practical value taan operating agency. Limits on agency control help t 
explain why some past evaluation effort* have had little impact. Many program problems lie beyond 
the reach of public agencies. As a consequence, program issues or problems should be selected for 
evaluation partly in terms of the extent to which the sponsoring agency can influence the factors 
associated with ameliorations or remedies. Public agencies neither control everything nor control 
nothing. Evaluation studies should be targeted 1o areas and subjects about which agencies can 
reasonably be expected to hav # e some say. 



Is Program Evaluation Research and/or Science? 

These questions may provoke word quibbles. But they also reflect additional evaluation issues 
which have tak&n on philosophical, territorial, economic and political overtones In its rooFmeaning, 
're-search" (from Old French recerche) means "to seek out, to search again." The American 
Heritage Dictionary (Morris, 1976) indicates that the word may refer to "scholarly or scientific 
investigation or inquiry," or it may mean "to study thoroughly" It is in the more ppen and general 
tneaning of study that program evaluation can be usefully understood. 

> 

In a narrowly restricted sense, the view of evaluation as "scientific research" naturally raises the 
corollary question, "Is evaluation science?" In terms of the actual practice and performance of 
program evaluation, even in its most exemplary form, few experienced evaluators would argue with a 
flat "no." Though it may employ technical methods, formal program evaluation, like its cousin policy 
analysis, is art and craft and not science. (Wildavsky, 1979) 

A related and underlying question is, To v^at extent is "social science" science? Pursuing this 
question might carry us afield, but it raises deep questions: What are the bases and alternative paths 
to human understanding And knowledge? What constitute reasonable and workable canons of proof 
and evidence in social science on the one hand and in political arid bureaucratic decision making on 
* the other? What kinds of proof and evidence about what kinds of subjects matter in the political 
economy of public agencies? These are more than idle questions. (See Lindblom and Cohen, 19^-; 
Almond and Genco, 1977; Campbell, 1979; Sharpe, 1976; Rein, 1976; Thorson; 1970.) 



, Summary and Conclusions 

^ The discussion to this point deserves a brief summary and the restatement of some conclusions. 

Stimulated by Federal requirements for and'financing of formal evaluation as a precondition for 
financial Support of services, evaluation activities have grown by some accounts into a well- 
financed industry. Though formal program evaluation has been mandated widely by the Federal 
Government (and more recently by some State governments), ambiguity and uncertainty persist over 
what means should be employed to generate acceptable evidence of program worth and value. Early 
on, proponents of evaluation as formal research came to dominate the literature on evaluation which 
- ' contains many authoritative prescriptions. For the most part they urged the U9e of traditional social 

\ science research methods. Proponents of these methods often assert that they will generate 
Impartial, objective, verified and reliable information on the impacts and effects of programs and will 
help identify their "causes." This information, it has been assumed, would, by sheer force of its - 
( authority and weight, lead to program improvements by "rationalizing decision making." 
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' Authoritative prescriptions for evaluation research have-not led to rrotable successes, but rather to 
disappointing and meager results. Ca&ual evidence, a growing body of testimony from experienced 
evaluators;and evaluation spoifeors^nd the results of sev^al studies suggest that the actual 
performance of formal program evafftetion has fallen far short of its early promises. Excessive 
government mandates coupled with dubious advice about preferred "scientific research" methods 
for evaluation seem to have led to frustration, some waste, very modest results and growing calls for 
fundamental reforms in the philosophy and practice of program evaluation. 

As it turns out, the basic actMty of evaluation — ascertaining or fixing the worth or value of 
something — is a commonplace>everyd£y human activity. Difficulty ensues when concern with 
evaluation is shifted from the level of the individual to the level of collective social judgment. At a 
collective social level, there exists a wide variety of political, economic and social mechanisms and 
processes through which judgments are regularly expressed about the worth and value of social 
programs. Although proponents of formal rationalistic approaches to evaluation find these mecha- 
nisms faulty and wanting, they remain among the most dominaht, available and widely used vehicles 
by which collective social judgments are expressed about the use* of resources in public programs. 
And they are the mechanisms and processes through which the results of forma* e valuation must be 
used, if fhey are to be used at all. \ ' * 



Many textbook models of formal evaluation appear to derive from an idealized, technical and linear 
(sequential) style of thinking and problem solving which does not "fit" well the environment of social 
and political interaction and adaptation in which all public programs operate. Although many social 
scientist-evaluators claim superiority for their preferred methods, a listing of some of the available 
alternative approaches, methods and mechanisms for evaluation includes existing political and 
bureaucratic processes, the exercise of ordinary intelligent observation and analysis, the use of 
conventional and Widely available methods of study angl investigation (including the widespread use 
of implicit or explicit causal checklists and pattern recognition), the use of a variety of modes of 
systems and policy analysis, and many combinations of interactive and analytical methods of social 
problem solving. 

r 

Attempts to apply formal research methods of evaluation in practice raise many basic issues. A 
brief discussion of some of them revealed the following: ' 

1 . Above and beyond \hemethods of evaluation which might be employed, our expectations 
heavily color and influence our judgments of the results of what we do, including evaluative 
judgments about social programs; 

2 Despite myth and rhetoric, formal evaluation is not value-free orValue-neutral. It is, instead, 
value-influenced and value-embedded. The many avenues tfiVo^h which values enter the 
practical processes of formal evaluation include (but are not limited to) the selection of a 
program for evaluation in the first place, the choice of evaluators (and their preferred 

m 'approaches), negotiations between evaluator and sponsor, compromises and adjustments 
required by field work, inferences required to move from findings to recommendations for 
future action and variable interpretations of the same study findings ordinarily made by a 
variety of actors involved in agency program decision making. 

3. The selection of criteria and indicators (measures) in terms of which program performance 
^evaluation might be made involves a set of both technical and value judgments which are 

intertwined and inseparable. Selecting criteria and indicators is not a mere technical 
problem but at its heart also a value problem. 

4. The branch of formal evaluation which derives from social science research traditions 
aspires to establish the "Qauses" of program effects. Yet an inquiry into our knowledge 
about a wide array of individual and social problems at which public programs are directed 
suggests that knowledge is normally partial, provisional acid often conflicting. In.addition, 
many other programs may operate' in the immediate environment of the one under 
evaluation and separating effects in any reliable way ma/ be extremely difficult if not 
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impossible; "causation" is an active subject of study and debate by. methodoldgists, 
/scholars and philosophers; and under the besf of 'laboratory" fconditiqns the social scientist 
•▼appears to have great difficulty establishing reliable and verified "causal" relationships 
everyn the case of carefully selected and constrainecl research problems. 

5. Though the philosophy of formal evaluation often assumes that a program design wiil be 
implemented in recognizable form amenable to easy detection and study, a variety of 
evidence suggests that the actual configurations and features of operating programs are 
the result of varied and complex adaptations to specific local circumstances and conditions 
Many conventional method* of formal evaluation, especially summary statistical methods, 
appear to miss or mask tte very factors which appear to contribute tcn^ective program 

. operations. These insight* have led to a growing number of propos^te^for major reform of 
evaluation theory and practice. v _ I 

6. *Much writing on progra/m evaluation appears to assurriethat the program defects and 

problems uncovered by formal evaluation can be corrected by the public agencies whicji 
finance and operateMhem. In practice, however, individual public agencies have only 
circumscribed (though.often significant) control over their internal operations and little, if 
- any, ovec many of the problems which they attempt to contain or .remedy through their 
services. , * 

7. Assertions to the contrary aside, a dozen years of experience with actual practice suggests 
that program evaluation^ art and craft and not science. An examination of some of th6 
prevalent problems of "fit" between formal research methods and program evaluation 
carried out in actual public agency settings raises provocative questions about the extent to 
which the social sciences are "science." 

* 

Next is a brief examination of some of the evidenc^bout the difference that formal evaluation 
appears to make in practice followed by an identificatron of a few of the many proposals to reform 
evaluation. The reader not interested in the details of studies may want to page ahead to the 
Intergovernmental Lessons recited at the end of part IV or turn directly to the guidance for the 
practitioner presented in part VI/ • . 




i 
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ly. What, Difference Does Program 
Evaluation Make in Practice? 



If program evaluation is valuable and worthwhile 4 to'9 public agency and to its decision makers, its 
value and worth should be shown through the use to yvffich evaluation i exults are put and through the 
impact which evaluation has on the opinions, attitudes, decisions and actions of policy makers 

Unfortunately, though the literature on evaluation is lafge, and the claims for its value numerous, ^ 
there are surprisingly few documented studies of its impact. To find them, a broad-based key-word 
search of several large abstract and information services was conducted It yielded about 620 
individual abstracts. # ". 

Project SHARE * - t / 130 

HEW Evaluation Documentation Center * 280 

National Criminal Justice Reference Service 1 75 

Dialog * 35 

HUD USER 0 

Total Abstracts j • 62 0 

We screened these abstracts for relevant sources, received useful suggestions from interviewees 
and colleagues, scanned issues of Evaluation magazine from 1972 to°J979, perused several 
evaluation journals and reviews, and drew op our own library. In all, we examined in hard copy oyer 
1^0 evaluation sources including manuals, case studies, articles, book£ and- papers We have 
referenced only a tiny fraction of tHiis material. 

As a result of this partial but extensive search, we found no body Of systematic or social scientific 
studies which yield a valid" and/Verified" 'picture of the utility, uses, outcomes, impacts and side 
effects of program evaluation in a public agency context. This may seem surprising in light of over a 
dozen years of experience with evaluation, the expenditure df billions of dollars on evaluation studies 
and the insistent demands by proponents of evaluation that expenditures of public funds should be 
put to the formalized tests of evaluation to assess their impact, establish their worth and improve their 
relevance and utility. How would program evaluation hqld up und^r the scrutiny and demand^* 
evtdence of worth and value which program evaluation is intended to bring to bear on public programs 
generally? There appear to be no definitive answers to this question. In addition to the testimony of 
experienced evaluators cited earlier, however, there is a small body of partial and fragmented 
evidence which contains some clues. 1 

Table 3 lists the major sources of evidence identified in the literature search. The Jjst is followed by a 
brief summary of the findings of each study and includes' occasion^ comments on limitations of the 
study or on its apparent significance. The sources are presented one by one roughly in the 
chronological order in which they appeared. Since the sources are few^jjp overall summary is 
provided. ' 
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Table 3 

Sources of Evidence on the 
Impact oN^rogram Evaluation 



Municipal Management and Budget Methods: An Evaluation of Policy Related 
Research, Final Report. Volume 1: Summary and Synthesis (Kimmel, Dougan 
and Hall, December 1974). . 

Program Evaluation Within California State Agencies: An Assessment (Conner, 
Rosener and Weeks, May 1976). 

* 

'Symposium on The Research Utilization Quandary' " (Wpiss, Spring 1976). 

"Factors Associated With Knowledge Use Among Federal Executives" (Caplan, 
Spring 1976). 

Assessment of State andlocal Government Evaluation Practices in Human 
Services (Baumheieret al., February 1977). 

Interim Analysis of ZOO Evaluations of Criminal Justice (Larson et al., May 

1979) . i 

'Lessons Learned From Federally Mandated Program Evaluation for Community 
Mental Health Centers: Framework for a New Policy" (Flaherty and Windle, May 

1980) . 

Utilization-Focused Evaluation (Patton, 1978). 




Summaries of Selected Studies 

* * m 

Municipal Management and Budget Methods: An Evaluation of Policy Related 
Research, Final Report. Volume I: Summary and Synthesis. Volume II: Litera- 
ture Reviews. (Kimmel, Wayne A.; Dougan, William R. ; and Hall, John R. Washing- 
ton, D.C.: The Urban Institute, 1974.) 

A team of two analysts and two consultants at the Urban Institute under the direction of the author 
conducted an extensive literature se&ph for^'research on the impact, utility, and effectiveness" of six 
management and budget methods which might be employed By local government,^ he study was 
one cff 1 9 funded by the National Science Foundation to screen what they described as a 'large body 
of research on municipal systems, operations, and services" created over the last quarter of a 
century. Each*tudy was to locate, evaluate for internal and external validity, and synthesize for yvide 
dissemination the findings in each area. 

The results of the literature search and review of program evaluation are summarized this way: 

A search of the literature revealed few empirical studies of the utility, impact or effectiveness of 
performing program t evaluation One attempt to analyze the impacts of several evaluations was made , 
by Wholey (1973) Ten evaluations wdre examined in an effort to relate the type of evaluation 
performed to the influence exerted on budget levels, service delivery* and internal government 
processes Assessed by the author, four of the ten evaluations appeared to have some impact on 
budget levels, five on service delivery and six on Internal government processes. 
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In two di&ertalions. McLaughlin (1973) and Pearson (1973) examined evaluations -of programs 
funded by the U S Elementary and Secondary Education Act. The authors judged eva'uation to have 
succeeded^ failed in terms of whether it exerted an impact on the management of programs or on 
later policy proposals Neither study judged evaluation to have succeeded % 
The General Accounting Office (US.GA.O., 1971) reviewed twenty-four evalAtions (fourteen y 
completed, ten ongoing) performed for the U.S Office of Education and foundthat. mffie opinion of OE ?w 
officials, five of the fourteen completed studies were of limited use" while the results of the other nine V. 
were adequate and useful " » ' 

A study by Eaton (1962) reported an unwillingness among professionals in two bureaucracies 
"^VCalifefma Department of Corrections and Weste/n V£. offices) todissemmate evaluative findings that 
{might be considered discouraging or might reflect unfavorably on their organizations Although there 
/appeaTtobe some weaknesses m the design of this study, its findings relate to the potential utility and 
s effectiveness of evaluation Evaluation cannot exert an impact on program and policy decisions if 
findings are suppressed by the organizations for which the evaluations are peformed 

InaTelated vein, a dissertation by Nielsen ( 1 972Upok fo r granteb that most evaluations had exerted no 
impact on program directors and attempt^pj^iscover why He found that non-use 1 ' of evaluative 
findings seemed to be due in part to mismatSfc between the information generated by evaluation and 
information needed" by program managers>j£h»s explanation was offered as a companion to the 
frequent Observation that program managers are threatened by and hostile toward evaluation 

There appears to be. in short, a very limited body of evidence from research and formal study on the 
utrhty. impact and effectiveness of conducting program evaluation (pp 37-38) 

The discussion of program evaluation in the report ends with this concluding note: 

The evoluti on of the literature on program evaluation. from the mid-1960s to the present [1974| 
appears to reflect a disappointment in the capacity of formal evaluation to revolutionize public decision 
processes This may stem from a combination of an early overselling" of evaluation's potential and 
growing awareness of its sometimes severe limitations 
* A single rule-of-thumb for potential users of evaluation might be!tha,t the probable benefits" of*an 
evaluation ought to exceed its costs'' however these are determined Programs to be evaluated 
should be of sufficient budgetary importaricje that it is worth the cost of formally evaluating them 
Furthermore, there ought to be a reasonabirprospect of a future decision to which evaluation findings 
can be brought to bear at the appropriate time Local managers should remember that program 
* evaluation, like anything else, is not infinitely valuable It may serve a useful purpose in the overall 
management processes of local government, but only within its technical and political constraints Not 
all government programs can or should be evaluated (p 47) 



Program Evaluation Within California State Agencies: An Assessment . (Conner, 
Ross R; -Rosener, Judy B.; and Weeks, Edward C. Irvine, Calif: Public Policy 
Research Organization, University of California, May 1976.) 

In this small scale survey, 17 departments, boards and commissions were selected from among 79 
in California based on a "judgmerw" about their high impact on social problems and/oron citizens and 
on information that they w6re in fact carrying on some kind of effectiveness measurement activities. 
Two-member teams, using a 25-question guide, interviewed 16 agency evaluators. They also read 
and analyzed 43 evaluation reports. » , • 

The bulk oTlhfc 36-page report of this survey is devoted to a presentation/of the answers of 
evaluators and to a set of recorjimendations for improving the organization, centralization, visibility, 
staffing, training and coordination of evaluation. Of interest here are a few findifjgs on the perceived 
utilization of evaluation results. / 

1. Six evaluators said results were 'Very well utilized," nine said "somewhat used" and one 
said "very little used." The authors summarize: "Current evaluation results, then, are used 
but not to any great extent." (The study gives no indication of what was meant by "use" or 
"utilization:*) 

2. The most frequent reasons given by evaluators for the limited use of evaluation results 
were: "nojncentive" (4); "low reliability" (4); and "results not tirrfely" (3). 
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3. As to potential benefits of evaluation in the near future, eight evaluators responded a very 
^ .-good likelihood, six said a good likelihood and two said the likelihood was poor. i 

4. The evaluators believed th&t other State officials (program managers, department direct- 
ors, agency secretaries, Governors* officers ancf legislators) viewed evaluation as it was 
"currently practiced" as "somewhat useful." They were more optimistic about evaluatiorf as 
it might be 'Ideally conducted" - ^ 

5. Thirteen evaluators viewed the department director as the prime beneficiary of program 
evaluation and 1 1 included the program manager. Most didnot view the legislature, agency 
secretary or public and program clients as> beneficiaries, (pp. 12-14) 

6 Few departments had conducted formal program evaluations; most of their effectiveness 
measurement apparently took Jhe form of "status monitoring." The authors thought more 
formal evaluation would be undertaken -in the future, (p. 20) 



Comments: 



The authors clearly favored increasing formal program evaluatign activities, especially through the 
use of control and comparison group 6tudies. The report contains many recommendations to 
centralize, coordinate and enlarge the evaluation function. Yet there is no show of or reference to 
evidence that evaluation will pay off to an agency beyond assertion, the reported beliefs of evaluators 
and the inclusion .of a one-page description of a California study of "some additional factors 
influencing the effectiveness of warning letters" in reducing traffic accidents and convictions' Impor- 
tantly, t(je study doesyiot illustrator describe what was perceived to constitute "use" or "utilization. 

"Symposium on The Research Utilization Quandary/ " (Weiss, Carol H., ed. 
Policy. Studies Journal, Spring 1 976.) 

Though research utilization is an area of broader concern than the utilization of program evalua- 
tion, there are, many issues of overlap and common concern. Thus, the reflections of Weiss on the v 
"symposium are relevant. * 

Through ^presentation of six papers^ Weiss attempted to Wing some government officials into a 
discussion which had been dominated largely by academic social Scientists. She also tried to 
assemble empirical cases to offset theT&cQhat much earlier discussion had been "impressionistic 
and speculative " Of ttje si* cases, two relate to evaluation studies. Caplan's is discussed in the next 



section. 

/ 

The other five included a survey of social scientists (Useem), a discussion of use (Janet Weiss), a 
view of research use in the State Department (Uliassi), a case study of evaluation of several Rousing 
projects (Banks and Clark), a case study of evaluation of an experimental education project (McGo- 
, wan), a case history of the role of research in mental hospitaf deinstitutionalization (Swan) and an 
account of the development and use of research in regional waste water management development 
(Conway et al.) 

Weiss assesses the implications of the cases this way: . 

And whatis the verdict frojn the six case studies about the usefulness and use of social research 1 ? Two 
of the papers are unflinchingly optimistic (Conway et al and Null), although the evidence in each case 
is modest. Two find some positive effects of social researfih, although not necessarily what either the 
researchers or the sponsors intended.(McGowan and Uliassi). Two deal with what might be called 
utilization fiascos (Banks and Clark, and Swan, but Swan sees hope for the future given the lessons 
learned). . 

•** 

The theme that emerges from the total set ol papers is that the use of research in governmental 
decision-making is a complex and difficult matter. . . . There is work to be done to clarify the ways in 
which social research can contribute more effectively to policy ...(pp.222-223) 

Of the six cases presented in the symposium, the suWey by Caplan deserves further attention. 

v 
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"Factors Associated With Knowledge Use Among Federal Executives." (Cap- 
Ian, Nathan. Policy Studies Journal, Spring 1976.) 

This is a summary presentation of a study reported more extensively elsewhere (Caplafret al 
1975). Caplan and associates conducted 204 interviews with' officials in Federal executive depart- 
ments, major agencies and commissions. Interviews were focused on "the use of empirically based 
social ^science knowledge." The study identified 575 "self-reported instances of social science 
knowledge use that impacted on policy decisions." Caplan cautions the reaper that the findings may 
be oversimplified 

1. He concluded that utilization (un^fined) is most likely too£cur when the decision-making 

loappr&c 



onentation of the policy maker is characterized by a reasoned appreciation of the "scientific" and 
"extra-scientific" aspects of the policy issueHThe "scientific" aspect refers to the "internal logic" of the 
policy issue (a diagnosis of theproblem). The "extra-scientific" aspect refers to the "external logic" of 
the policy issue (the political, valup-based, ideological, administrative and economic considerations 
involved). Caplan grobped officials into three "orientations": 

• Twenty percent expressed a clinical orientation They first gather the best available information to* 
diagnose the internal logic of the problem Then they gather informat.on bearing on the external" 
logicjpf the problem and finally weigh and reconcile the conflicting dictates of the information " 

• AnotirW 30 percent of the interviewees were classified as having the academic orientation , those 
who 'are often experts in theV field and prefer to devote their major attention to the internal logic of 
the policy issue They 3re much less willing, however, to cope with the external realities that 
confound policymaking " They apparently use social science information in " moderate amounts" 
and in routine ways to formulate and evaluate policies largely on the basis of scientifically derived 
information " 

• A third group, comprising another 20 percent of the interviewees, had the advocacy orientation , 
those "at home in the world of social, political, and economic realities " They reportedly rftake 
"limited" use of social science information and "largely" to rationalize a decision made on other 
grounds." (p 230) , ^ 

• The orientation of the remaining 3Qrpercent is not provided in this particular reporting of this study 

♦ * 

2. Caplan reports that "the rrjost frequent users of social science research" have a "social 
perspective — a sensitivity to contemporary social events and a desire for social reform " fte^ 
comments: 

It is evident to a farge extent that many respondents fail todistmguish between objective social science 
information from subjective social sensitivity Thus most of th e examples which they offered to illustrate 
v knowledge applications really involved the application of organized common sense and social sensitiv- 
ity, which as a mixture, might be called a "social perspective " (p 231) 

Caplan reports that these officials applied a "value-laden appraisal" of policy, Though" they citgd 
specific social science information^ the final decision whether or not to proceed with a particular 
policy, was more likely to depend upon an appraisal of 'soft' kpowledge (nonresearcji based, 
qualitative 4 and couched in lay*language) — " These officials yvere also eclectic in their use of 
infomfttion sources, relying orr newspapers, TV and popular magazines as well as on scientific 
government research reports and scientific journals. Caplan got "thfe overall impression that social / 
science knowledge, hard' or soft', is treated as news by these respondents — allowing its users to 
feel that their awarenesS of contemporary social reality does not lag behind." (p. 231) 

3. Because a policy maker is, often confronted with "an overwhelming number of bewildering and 
complex responsibilities," research is often sponsored to help him "find his way out of this Conceptual' 
mudhole." Unfortunately, the purpose of such research is, according to Caplan. "rarely made explicit 
to the researcher." Some interviewees, for example, supported the use of social indicators and they 
even named some. But when asked about the uses they would make of such data, "The responses 
were scrambling and diverse that it was impossible to derive empirically based coding categories for 
purposes of quantification.' 1 Caplan stresses here a' precondition for the conduct of evaluation and 
similar studies which -is seemingly crucial: there must be some "previously agreed notion of what 
purposes" are to be served by the expected results of the study, (pp. 231-232) 
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4. The final general- conclusion by Caplan is tbat utilisation of study results is more likely if: 

' a. they (findings) are not counterintuitive; 
b. they are believable on the grounds of objectivity; and * 

c their action implications are politically feasible, (p. 232) *^ * 

Caplan points out that "objectivity" may relsfte both to methodology and to interpretation. He suggests 
* that "perhaps more than for other reasons, careless, irresponsible and shoddy program evaluations 
were cited by respondents to discredit social scienc& research." He notes that "The ultimate test of 
data^cceptability is political. Rarely are data in their own right of such compelling f occe as to override 
•thein political significance. This is an ancient issue and much has been written on it; it remains 
important." 

- . 4 

In concluding his analysis, Caplan observes that the conditions which appeared in his study td 
influence utilization overlap and appear, to be "somewhat contradictory." 

- • It does appear, howeverjhat the major problems that hamper utilization are nontechnical. That is, Jhe ♦ 
• level of knowledge utilization is not so much the result of the slQwflow of relevant and valid knowledge 

from knowledge producers to policy makers, but is due more" to factors involving values, ideology and 
decisionmaking styles (p. 233) / 

•oo_ ' - f: . ■< 

This study identifies the orientation of an official as an influence on Jhe types of knowledge-that are 
sought and used It does not, however, define use, utilization, ox impact. It apparently accepts the 
self-reports off espondents. Is use merely reading a report? Or is it a change in the understanding of 
the reader? Or is i{ an action wKich would not have been taken in jhe absence of a study? Or 
something else? The possibte alternative meanings of "use" leave us .guessing about some,of the * 
implications of the Caplan survey. ^ » 1 

Second, it is not clear from this reporting what proportion of those with an "academic orientation" 
(the most frequent users of "social science knowledge"), for example, vVere in research, analysis and 
evaluation positions in which the nature of their jobs and roles required the use of social science 
sources. ^ 

Tnird, the study underscores the fact that the overwhelming majority of officials use multiple , 
sources and types of information and that the dominant use of information is political. 

Finally, Caplan could not conclude from this study that the "relevance and validity" of knowledge 
does not inhibit its use. He Seems to believe that there is an adequate flow of valid and "objective" 
socfal science knowledge relevant to many (most?) policy problems. * 

. v * ' 

Assessment of State and Local Government Evaluation Practices in Human . 
Services. (Baumheier, Edward C, et al. ^Denver: Center for Social Research and 
Development, February 1 977.) • 

The Center for Social Research and Develoflunent, University of Denver, conducted this study of 
evaluation practices for the Office of the A$sistanbSefcretary for Planning ancj Evaluation, DHE W. The 
purpose of the study was to assess the evaluation practices of State and local governments in areas _ 
of human services and to provide these governments with critical assessments of "various organi- " 
zatipnal structures, methodological techniques, and operational procedures for conduGting and 
Utilizing program evaluations." . n 

* ' * < 

Three-day visits were macte to nine - States "selected as good examples ["exemplary"! of 
evaluation units located in a wide variety of organizational structures within State and focal 
governments/' The sites included evaluation units in the Department of Health and Rehabilitative 
J3ervices,Florida, Department of Public Welfare, Texas; Sao Diego County, Calif.; Human Resourqes 
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Administration, New York City; Hennepin County Mental Health Center, Minneapolis, Minn.; Voca- 
tional Rehabilitation Service, Lansing, Mich., Office of the Governor, State of Washington; and the 
Joint Legislative and Review Commission, Commonwealth of Virginia. * 

The main conclusion of this study is that "decision makers at all levels benefit behaving an 
evaluation unit pv^ilable to them to provide specific information about programs in their areas of 
responsibility." Though too numerous to reiterate in full, here are several of the study's specific 
findings: • # * 

Some evaluations were initiated to identify program problems, others to justify the value of a 
* program, and a few without a clear purpose in mind. Sources of initiation included the 
legislature, the executive, the program to be evaluated and the evaluation unit jtself. 

Activities identified as "evaluation" took many forms. The two most common were (a) 
evaluation research which tended to follow experimental methodology, and to be outcome 
oriented, summative in nature and limited in scojfe; and (b) performance monitoring which 
> tended to be formative evaluation of the service delivery process and descriptive rather than 
' experimtental. % 

Performance monitoring -Was found more prevalent, addressed more practical concerns, 
and was utilized to a greater extent than evaluation "research." Study recommendations 
.suggest that experimental research be left to the Federal Government, while States and * 
localities pursue performance monitoring. 

The evaluation units that worked the most actively to promote utilization were the units 
whose evaluations were the most utilized." They sought approval from decision makers for 
their reco/nmendations* developed plans for implementing recommendations, provided 
technical assistance for implementation, and checked periodically on progress toward 
implementation. 

Three conclusions were reached about the transferability of evaluation activities: 

First, none of the specific findings of the evaluation case studies are directly transferable to 
other settings This is true because no two human service programs are alike Even categorical 
programs are administered in widely.divergent fashions across the country . i 
Second, few of the specifjc evaluation methodologies utilized mine case studies are directly 
transferable to othet settings. This is true because performance monitoring ddes not follow as 
structured a set of procedures as experimental research 

Third, the general experience of the sites in establishing and operating evaluation systems are 
clearly transferable to other settings . (pp. 4-5) 

Experience in individual* sites is recounted in a set of nine case studies which accompany the main 
report. • 

Comments: / 

This is one of the few stydies which attempts to describe what local and State evaluation units are 
actually doing in the name of evaluation and with what degree of perceived success. Thfe site reports 
are worth the time of those who want to establish a new or strengthen an existing evaluation 
capability. 

The criterion of evaluation impact used in the'study was the combined/udgmenf of the evaluation 
unit and the field researcher. Together they selected one evaluation study of apparent high impact 
and one of relatively low impact and then examined the factors seemingly associated with each. It is 
uncleartiow the resulting nine stydies of high impact and the nine of low impact compare with the 
dozepes^of others carried out by UT^sybfed^e valuation units. 

* The study tends to confirm the general conclusions VeachecJ elsewhere that' the context of 
programs varies widely, that specific evaluation methods and findings cannoi be transferred 
wholesale from pl&ce to place and problem to problem, and that evaluation approaches and methods 

\ . . . 

* 
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have to be tailored to specific programs and problems in specific contexts. The case studies indicate 
that the performance monitoring recommended by this study consists of descriptive studies of * 
program and management ^practices and processes. The studies clearly are not "scientific" 
evaluation reseaj^ITbn the "causes" of program effects. They fit more closely the traditional mold of 
organizajion, -Operations, and manage?fient~studies and analysis, the kind that State and local 
program leadership apparently find the most desirable and UsefuK % 

Jnterim Analysis of 200 Evaluations of Criminal Justices (Larson, Richard C, et 
al. Cambridge: Operations Research Center, Massachusetts Institute of Technol- 
ogy, Ma/ 1979.) N> 

This is one part of a larger study of methods used in criminal justice evaluations. It is based on a . 
structured sample" of 200 of the "best" evaluations selected from among roughly 1,500 studies 
identified as evaluations in the National Criminal Justice Reference Service (NCJRS) in late 1977. 
Rfty percent of the sample was intentionally selected from "logistical" programs in which "the 
movement of persons, material or other entities was an important element." The other 50 percent 
came primarily from "social service type programs in which counseling or some other type of service 
is provided to one or more client groups." The sample also reflected three differ&RtLaw Enforcement 
Assistance Administration (LEAA) evaluation efforts, evaluations of information in an area (police 
preventive patrol) , exemplary projects nominated for wider replication, and L5AA "anti-crime impact 
cities" programs. The sample also focused on studies which "purported" to use "certain current 
methodologies" such as time series analysis, experimental design, models, decision analysis, etc." 
(PP. 5-7) 

About 1 ,500 NCJRS document summaries were reviewed and graded subjectively on a scale from 
A to D -Those studies with the "highest grades" (the "best") were selected. Readers spent roughly 4 
hours with each evaluation report and completed a checklist of 31 entries to "obtain information 
regarding evaluation input, process, and outcome, and to assess in a general way the relevance of 
the methodology employed, and the quality of the documentation." (pp. 10-15) 

The study team notes that they were concerned with the use of evaluations by decision makers, the 
likely value of the evaluation information generated^ the misuse and abuse of quantitative methods 
and u the use of adaptive evaluation methods to respond to feedback "from the field." Because 
adequate information on use was not available in the documentation examined, however, th^reamis 
, administering additional questionnaires toevaluators and "consumers" of evaluatiQn reports. They 
hop£ to document the "budgeting, timing, planning and design of evaluation (inputs), interaction 
between program staff and evaluators, e.g., communication (process), and the ultimate use of the 
evaluation." This interim report contains many summary descriptive statementi about the informa- 
tion provided in the evaluation documents examined. Many relate to issues of technical methodology 
which are not our concern. Only a few of the two and one-half pages of tentative conclusions (pp. 
68-70) are of interest here % o * - 

Target population was not discussed in one-third of the sample. "A slight majority qf 
reports did not consider whether the program had been implemented as'designed, 
and description of program activities is frequently inadequate as well." 

, . Experimental and quasi-experimental designs wer§ the most common types, fol- 
lowed by narrative case studies; there was little use of statistical or formal models . 

"The most widespread problems were misapplication of common, statistical tech- 
, niques and difficulties in attributing outcomes to program activities; i.e., poor choice 
of performance measures. 

» 

"There is a generalized lack of documentation of data collection procedures, and f ± 
data were sometimes poorly used once obtained. A complementary problem is poor 
presentation, more so in .qualitative than in quantitative studies." (pp. 68-69) 
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The report concludes that many of the problems identified *are manifestations of the basic problem 
with the criminal justice evaluations in our sample, namely that quite frequently the evaluation 
methodology used is not well matched to the type of program being evaluated." (p. 69) 

The report recommends the - use of: > v 

well-structured hypotheses or mental models concerning how the program should work, it is very 
important that the evaluator.have some notion of how progam activities are linked to desired outputs 
and to other social, economTc and political activities in the subject community in many instances, the 
use of statistical or other formal m6dels would help immensely. The point of stressing the need for 
articulated hypotheses is tO-wean evaluators away from the textbook formulas to which they were 
taught \o adhere with little regard for circumstances, (p. 69) 

In add-on, "difficulties in applying various types of social science methods and measures were 
frequently manifested. . . . Common sense occasionally gets lost in the pursuit of elegant methods." 
, (p. 70) The study t^am is pursuing better documentation of the input, process and outcome 
-characteristics of'the 200 evaluations. k 

» 

Comments: * ' 

This report is primarily oriented to formal technical (social science research) issues related to study 
design, formal methods, data uSe, etc. Evidence already presented in this monograph suggests that 
the emphasis on formal methods has been grossly exaggerated. The usefulness and impact of 
evaluations seem related more closely to the articulated, situational and felt information needs and 
optrons of intended users and decision makers. The qualitative commentary in this report seems to 
bear out the point. This interim study seems to rest partly on the dubious assumption that the more 
formal the methods employed, the more useful the evaluation. Beyond some minimaHevelof credible 
methods, this#ssumption is doubtful. The next* two studies provide additional reasons. 

"Lessons Learned From Federally Mandated Program Evaluation for Com- 
munity Mental Health Centers: Framework for a New Policy." (Flaherty, Eugenie 
Walsh,-and Windle, Charles D. Submitted XoEvaluatipn and Program Planning, May 
1980.)' ? ' 

This paper examines assumptions that appear to underlie the extensive 'program evaluation 
requirements of the Community Mental Health Centers (CMHC) Amendments of 1975 (P.L 94-63). It 
discusses'four alternative evaluation models and their "sometimes contradictory purposes" and the 
conflicting motivations and values about evaluation held by key parties in evaluation. The authors 
then propose nine "principles'*^ guide future Federal CMHC evaluation policy and suggest ways to 
guide polfey on accountability and program improvement. 
. • fa 

This study is one of the few apparent attempts to examine critically the experience with a set of 
federally mandated program evaluation requirements for aspecific program and to infer lessons and 
guidance from that experience.* It draws on a wide variety of evidence and experience including a 
1978 study by Flaherty and Ols,en of evaluation in nine CMHC's funded by the National Institute of 
Menfal'Health (NIMH) and conducted by the Philadelphia Health Management Corporation. 
# * * 

Several of the authors' observations and conclusions are of interest. They cite, for example, the 
findings of three studies and cortc'ude that, Federal fears aside, CMHC's would continue to do some 
evaluation work even if Federal requirements were removed. Centers would reportedly reduce the 
amount of evaluation by eliminating activities that are not "stimulated by center need." They doubt 
that program self^evaluation-will contain costs ?nd report that "program evaluation generally has 
been used to justify program expansion rather than program contraction." They also conclude that 
the "stringent evaluation requirements in P.L. 94-63 were basfed on assumptions that are only 
unevenly supported by available evidence and analysis" and "may not be justified." (p.3) 

The authors identify four alternative models of evaluation (amelioration, accountability, advocacy 
and traditional research) and conclude that there is "little evidence" on which purposes and use 
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evaluation is actually put tp in- CMHb's. They conclude that "external pressure to do evaluation is 
associated with minimal utilization, only when evaluation is initiated because of centers own felt need 
is evaluation judged very useful (Flaherty and Olsen, 1978)." (p. 7) ~ x ' 

The authors judge the use of evaluation for advocacy purposes "of doubtful integrity and Ipng-run 
value for improving the quality of mental health services, although it has some immediate value for 
program viability." (p. Sffhey also believe that the "traditional research model" of evaluation conflicts 
with the other three models partly because: ' 

it is likely to displace these applied forms of research, because it is of more interest and personal 
value to program evaluators, thereby shifting the topics, approaches and funds away from program 
relevance and use. (p. 10) 

0 

Centers apparently comply only "minimally" with a requirement for an annual evaluation report for 
citizens. Few mechanisms for communication between centers and citizens exist and lack t of 
compliance apparently springs "most importantly" from a "lack of citizen pressure, knowledge, or 
interest (Flaherty and Olsen, 1978)." (p. 8) The authors conclude that "These four models o| 
evaluation are incompatible." (p. 9) 

Flaherty and Windle take the reader through a parallel discussion of the varied and often conflicting 
"motivations and value systems'' of "key parties in evaluation", center administrators, clinicians, 
citizens, service consumers and evaluators. Evaluation is most beneficial to administrators when it 
can be used, alternatively, to satisfy external requirements, describe the center to outside groups, 
assist management decision making, bring prestige and respect as evi<3£nce of serious efforts at 
self-management, increase the administrators 1 control of staff or visibly increase then* ability to 
improve the center, (pp. 20-21) The authors summarize:. , 

These benefits occur most immediately when evaluation is conducted under the Advocacy Model, and 
next most quickly under the Accountability Model directed at funding and governing agencies but not at 
citizens Benefit is most delayed and diluted in impact when evaluation is conducted under the 
Amelioration and Traditional Research Models, which take long to generate findings and a re uncertain 
in results, (p.21) 

Finally, the authors note that several studies suggest that the Community Mental Health Center 
endmentsof 1975 require evaluation far iq excess of centers' capacity and resources." (p. 22) 



Ajm 



Flaherty and Windle derive nine "principles" for Federal policy for CMHC evaluation. Paraphrased 
and in summary form, they appear to suggest that evaluation requirements should: 

Be^feasible and "not exceed by much the capacities of agencies to comply." 

• Be flexible to accommodate varying programs' processes*and evaluation topics, purposes and 
methods, and to permit discretion about what, when and how to evaluate. 

• Focus on accountability to the public and be limited to a few issues of importance, especially 
descriptions of what was accomplished and not program judgments about what was done. 

• Not require "studies of client outcome" that are too expensive and complex and should be left 
instead to "special research/' 

• View evaluation as developmental and not require uniform and standard evaluation activities 
from programs at many different stages of development. « 

In addition, requirements should safeguard the confidentiality and dignity of program clients and 
staff, provide for routine dissemination and publicity of evaluation results, provide fox evaluation of the 
evaluation activities themselves, and provide independent support for citizen participation in evaluar 
•tion. (pp. 23-27) ■ ' ^ 

Comments: 

This paper is worthwhile reading for the lessons it conveys to anyone considering mandating 
evaluationrgquirements from higherto lower levels of program and/or government. It adds additional 
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weight to the view that outcome studies should not be routinely mandated of local programs and 
reinforces the position that evaluation serves local programs best when it satisfies locally defined 
purposes and uses. 

Utilization-Focused Evaluation? (Patton, Michael Quinn. Beverly Hills, Calif.: 
Sage Publications, 1978.) 

. This is a wide-ranging, well-illustrated and probing discussion of evaluation. Of interest here is a 
study of the utilization of 20 Federal health evaluations that serves as part of the basis for Patton's 
proposed practices to increase the likelihood that evaluation results will be utilized. 

In the fall of 1975, Patton and participants in an Evaluation methodology training program at the 
University of Minnesota conducted inductive followup case studies of 20 "examples of excellence" in 
national health evaluations "selected from among 170 evaluations on 'file in the Office of 1 Health 
Evaluation, DHEW." (Less than half the 170 studies qualified as "evaluation research" since many 
were found to bp "nonempirical thinlfpieces or policy research studies aimed at social indicators in 
general rather than evaluation of specific programs.") The 20 evaluations included 4 mental health 
center activities, 4 health training programs, 2 national assessments of laboratory proficiency, 2 of 
neighborhood health center programs, 2 studies of health services delivery systems programs, 1 
alcoholism training program, 1 health regulatory program, 1 Federal loan forgiveness program, 1 
training workshop evaluation, antT2 Q^aluations of specialized health facilities. Six of the 20 cases 
were internal evaluations, 1 3 were conducted by outside groups and 1 was done by one Federal unit 
for another. They ranged from a one-person 3-week program review to a 4-year evaluation which cost 
1.5 million dollars. 

* 

Three "key informants" Were intensively interviewed about the utilization of each of theS>0 cases: 
the study project officer, the person identified by the project officer as the decision mak^r for the 
program of the person most knowledgeable about the study's impact, and the responsible evaluator 
Most of the decision makers were office directors (and deputies)/ division heads or bureau chiefs 
Interviews averaged 2 hours and ranged from 1 to 6. The.y were taped and transcribed. Three staff 
members independently analyzed tfife transcriptions for patterns and themes. Hypotheses were 
formulated and interviews were re-examined for relevant evidence, pro and con. . 
* < 

Interviewees were permitted^ definelmpact in th&irown terms for these exemplary evaluations. 
Seventy-eight percent of the decision makers and ninety percent of the evaluators felt that the 
evaluation had had an impact on the prograqp. Eighfy and seventy percent, respectively, felt there 
were also "non-program" impacts. Percaived impacts were not, however; the kind where new 
evaluation findings "lecf directly and-immediately to the making of major, concrete program deci- 
sions." Patton reports: t * 

The kind of impact we found, then^was that evaluation research provided some additional information 
that was judged and used in the context of other available information to help reduce the unknown^ in^ 
the making of difficult decisions. The impac\ranged from "it sort of confirmed our impressions 
confirming some other anecdotal information or impression that we had" (DM209 7,1) to providing a v 
> new awareness carrying over into other programs. . . (p 30) 

.utilization is a diffuse and gradual process of reducing decision-maker uncertainty within an existirj 
social context (cf. Levine and Levine, 1977) (p. 34) 

Patton concludes that utilization o( evaluation studies can be increased and better targeted but that 
the results will be more modest than rationalizing decision-making processes. 

Throughout the book, Patton painstakingly reiterates that the touchstone of an evaluation that is 
likely to be useful is not iheevaluator's thepries, methods, specification or interpretation of program 
goals or evaluation criteria, but rather^ " ^ 

The first step in the utilization -focused approach devaluation is IDENTIFICATION AND ORGANIZA- 
v ' TION OF RELEVANT DECISIONMAKERS FjDR AND INFORMATION USERSOF THE EVALUATION 
(Emphasis in the original, p. 61.) ( 1 



9 
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Patton stresses ttie importance of what he calls "the personal factor," which emerged unexpectedly in 
the study of 20 health evaluations. 

To target an evaluation at the information needs of a specific person or at a group of identifiable and 
interacting persons is quite different from what is usually referred to as "identifying the audience" for an 
evaluation Audiences are amorphous, anonymous entities. Nor is it sufficient to identify an agency or 
organization as recipient of the evaluation report. Organizations are an impersojiaH^oltection of 
hierarchical positions. People, not organizations, use evaluation information (f/63) » 

He reiterates: - \ 

The specifics vary from case to case but the pattern is markedly clear •where the peteonal factor 
emerges, where some individual takes direct, personal responsibility for getting information to the right 
* people, evaluations have an impact. Where the personal factor is absent, there is a marked absence of 

impact Utilization is not simply determined by some configuration of abstract factors; it is determined 
. in large part by real, live, caring human beings, (p. 69) 

In tfye last chapter, Patton summarizes "utilization-focused evaluation:" 

* There are only two fundamental requirements in this approach' everything else is a matter for 
negotiation, adaptation, selection, and matching First, relevant decisionmakers and information users 
must be identified and organized — real, visible, specific and caring human beings, not ephemeral, 
general and abstract'^audiences," organizations, or agencies Second, evaluators must work actively, 
reactively and adaptively with these identified decisionmakers and information users to make all other 
decisions about the ej/Sluation— decisions about research focus, design, methods, analysis, interpre- 
tation, and dissemination, (p 284) 

Between the summary of his Study of 20 evaluations and the closing chapter, Patton takes the reader 
through a wide array of issues, illustrations, study evidence, theory, anecdotes, personal expediences 
arid basic topics including "focusing the evaluation question," "the goals clarification game," "the 
methodology dragon," etc. % ■ 

Comments:* _ 

This is a pragmatic, realistic, carefully stated and broadly based discussion of evaluation. It is laced 
with the lessons of experience and common sense and is highly recommended. Much of Patton s 
advice is similar to or consistent with the guidance given in part VI of this monograph. 

Other Studies 

Our search identified a few other studies, usually funded by the FefJeral Government, that 
examined State-level evaluation activities either as a single focusr&r as part of a broader look at 
program management Typically, however, these studies appear to accept evaluation activities at 
face value. They do.not explore use or impact, but nonetheless conclude by urging more evaluation. 

For example, Pacific Consultants (February 1977) made site visits to eight States and one Federal • 
Region an'd surveyed the remaining States by phone. In this study of social service evaluation under 
Title XX sponsored by the Social and Rehabilitation Service of DHEW, 'level of evaluation 
performance" was indicated by the "riurfiber of studies completed or in progress." High performers 
were defined as States with 6 to 1 8 evaluation studies; moderate performers with 1 to 3 studies; and 
low performers.with no studies. An examination of the 6 high performer States suggested that they 
tended to focus on impact studies; identified "program planning^and improvement" as the primary 
purpose of evaluation, had planned substantially for social services; had special evaluation units that 
were "broad-scope" and relatively large (nine or more full-time equivalent ktaff); and had at least 
$150,000 available for evaluation. The study cautions .that the descriptive factors they examined * 
were not fully explatWory.and that "a number of factors includeain the model must coalesce within 
the sarne state to produce significant probability of high performance." (Emphasis in the original, p. 
16.) The study also identified 81 evaluation studies that were dither completed or in progress:'! 7 
management, 1 1 client characteristics, 34 profcess and 1 9 impact. The contractor saw an increase (at 
least short-run) in the numberol process and impact studies. 

While this study concludes that there is an overall improvemenf m the state of the art of evaluation 
since the implementation of Title XX, no attempt was made to ^ssess the utility, impact or use of the 
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studies already completed or the conditions associated with that use. Here, as commonly elsewhere 
in the evaluation literature, the general value of formal evaluation studies is taken for granted and the 
dubious assumption often made that the more sociar scientificthe better (usually in Jterms of 
traditional methodological characteristics). Oddly; these studies, which are often cloaked in the 
semblance of "science," appear to rest on circular reasoning and do not explore or sometimes even 
raise the basic questions: Of what value, worth.^use, impact or relevance were these studies? To 
whom? Compared with^what? 

Finally, the Urban Institute conducted a 2-year study of State implementation of Federal Title XX. 
social service programs. (Benton, Feild and Millar, 1978) This study also reported that there was 
"optimism* among State-level interviewees that "the use of evaluation data would increase over the 
next 3 years." More self-consciously than some other studies, however, this one at least questioned 
the assumption that producing more evaluation data will result in its usj in decision making 
processes. 

There are surely other studies of the use and impact of program evaluation that have been 
overlooked. Readers acqtiainted with them are urged to add the' evidence to what has been 
presented here and come to their own conclusions. 



General Conclusion 

In an attempt to uncover evidence on the actual use, utility andJmpact of formal program 
evaluation, we searched for and screened a large volume of documents aiW studies. We did not find a 
body of valid, scientifically verified evidence which upholds the many clamis for the value of formal 
evaluation. We found, insteacj, about a dozen or so assorted studies that bear on this issue and 
selectively summarized them. On the whole, they suggest a small, uneven, and modest use and 
Jmpact of formal evaluation studies as these studies have been initiated, designed and carried out in 
the past. They also point to some 'practical tips for the practitioner. 

Dravying on this eclectic body $f evidence, the testimony of experienced evaluators, discussions 
>vith experts and our own experience, we give in part VI some general guidance, suggestions and 
rules-of-thumb that might help the State and local agency official, manager and practitioner decide 
what to do when confronted with decisions about conducting formal evaluation. Before identifying 
some of the proposed reforms to traditional evaluation theory and practice, we suggest a few general 
lessons that the Federal experience with program evaluation might suggest to other levels' of 
government. 



Intergovernmental Lessons \ 

The U.S. system of federalism provides opportunities for trial and error and for cumulating 
experience with an approach in a circumscribed way short of universal adoption or application. In 
principle at least, learning from these experiences may be transmitted tq other levels and locations in 
the system. One part of the social or'governmentarsystem may then iearn from the successes or 
failures- of another. These learning experiences are possible and have occurred historically in 
multilateral- directions (many from State and local levels to the Federal level). Some intergovern- 
mental and intersector borrowing of practices, however, do not appear to 6e based on learning but 
rather on copying and mimicry. In these instances, untested cfaims for an approach may continue to 
run well ahead of the caveats of expeiteace. The Federal Government may have cycled through the 
adoption, use and adaptation of an approach like PPB. The States, by contrast, may be starting a 
' cycle with,the same premises that the Federal Government may have already abandoned or 
modified. 
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History may be at or beyond a similar point in the case of program evaluation. Federal expectations 
and pr^ptice are much mo^e modest now than in the late sixties or early seventies; claims have been 
substantially muted by the force of experience. Yet reports suggest that some States have begun to 
copy not the recent but the earlier Federal experience without the benefit of the lessonj already 
learned by the Federal Government. What, then, are some ofthe lessons. about the use and practice 
of program evaluation that might coptribute to a satisfactory intergovernmental learning experience? 
Here are some that appear to transcend the operational suggestions given later in part VI. 

1 . Be selective in the requirements for and use of formal program evaluation. Donot mandate 
program and project evaluation requirement? (through laws, regulations and other rulemak- 
ing) uniformly and comprehensively for every program. This will lead. inevitably to redun- 
dancy, waste and the diversion of some resources from more useful management pur- 
eposes. Not every program can or should be evaluated formally. 

2. Do not expect that formal program evaluation will yield satisfactory overall conclusions, 
about "all-or-none" questions or about the overall worth and value of programs. These 
judgments emerge from social and political processes and not from formal studies. 

3 Do not mandate outcome evaluation studies. They are expensive/ complex and often 
impossible. These efforts are best left to special applied research, probably conducted 

. most reasonably on the national level. 

4 Do not mandate any single evaluation methodology, ideology, approach or method. 
Appropriate tools and approaches for formal evaluation should be suited and fitted to 
specific program problems and to the information needs of a wide variety of agency 
administrators, program officials and other influential whose circumstances differ substan- 

" tially. ' ' 

5. . View evaluation, in a broad sense as study (rather than formal research) and include 
management, policy, operations, procedural and workforce efficiency and effectiveness 
studies. 

6., If program evaluation requirements are to be established, make them selective^re- 
v strained, permissive and enabling. They should not be universal, ambitious, restricting, 
detailed &nd compulsory. 

7. Do not mandate evaluation because it will be good for the other guy. If the officials of a State 
or local agency do not intend to use the results of evaluation in concrete ways, they should . 
not mandate it for others. 

% 

8. * Be modest in expectations about the payoff of formal evaluation for resource allocation and 
,.. program management decision making. * . 

9. Keep program evaluation in perspective. Reexamine the evidence' about its likely payoff. 
Think about it. \ 



V 

42 Human Services Ponograph Series • No. 18, April 1981 



49 



V. Proposed Reforms of Traditional 
Formal Evaluation 



Practical experience with formalized program evaluation and its seemingly small payoff has led a 
variety of self-conscio.us observers, practitioners and commentators to propose reforms for the 
theory, practice and role orevaluation. Though some of these reforms may have no immediate 
practical implication for the State and focal practitioner, they reflect how profoundly the area of formal 
program evaluation is under reconsideration and transformation Too numerous to treat in number or 
detail, here are a few of the dominant proposed reforms. - \ 

^Sustain a Reasonable Measure of Self-Ev|kiation 

Past evaluation philosophy has emphasized that the "best" evaluation is done Tom outside the 
program , either~at a higher level in an organization, or by a body, group or institution beyond the direct 
influence of the program to be evaluated. This advice appears to be based on the joint premises that 
(a) agencies and programs left to self-evaluation will be self-serving and biased ("they can't be 
trusted ), anc} (b) those outside will have no vested interest and will be more impartial and unbiased 
Both these premises may be faulty. First, therS is no pure unbiased, or value-free evaluation There 
are' only many perspectives from which different value judgments may be made, some more 
persuasively than others. /Outside judgments are not always more compelling than those inside 
Second, holding program ^officials responsible and accountable for ? program and ijs performance 
'.Should entail giving them jome share of the responsibility, encouragement ahd resources for 
/ielf-evaluation, , • ' * . /' 

\f\ generaf tr usmg t mixed approache^f both inside an^outside ^evaluation seem more sensible than 
using either one abusively. It is signiffcantjhat while criticism and pressufe for program reform may 
come from outsi<re?many reform'^carvpnlv be effected by those inside. Reforms that 'find their source 
partly on the inside may occur more adj|pta£ly, more effectively and more enduringly than those 
invented elsewhere. ' This position urg^S rep^pnng^e respectability qJL internal or self-evaluation in 
combination with other varieties^ 

Support Competitive Evaluajions 

This reform proposal can be viewed as an extension hHhe first one. 4t acknowledges that all 
individual evaluations will be partial* spring from some value position.and be without much external 
* cross-checking. To increase the range of both analytical and va*iue input, several competitive 
evaluations of the same program are urged. Out of this competitive adversarial process will cgjpe, it is 
argued, better cross-checking and error-correction than is possible with a single try. This position 
appears to rest on the general logic that underlies adversarial judicial proceedings, competitive 
markets,and much of science. (Polanyi, 1964tToulmir>,1972,-fleck,1979; Judson, 1979) A practical 
implication of this proposal is risk-spreading mentionedAarlierr do several small evaluations of 
different program dimensions or problems rather than onelfitended to be global or comprehensive 
The competitive evaluation position a|so Cirges that multiple evaluations of program activities 
consciously reflect, major alternative views of or positions. on program problems and issues In 
over-simplified terms, one evaluation might be undertaken by aprovider-onented group, another by a 

"S 

Human Services Monograph Series • No. 18, April 1981 43 

; ' . so • 



9 , ' 

client-oriented group and a third by a finance/management-oriented group. Or, one evaluation might 
exajnine component A of a given program, another component B, and still another component C. In 
still another situation, a given program or one of its components might be evaluated simultaneously 
from two or three competitive political or ideological positions. It is presumably as a result of 
competitive evaluation and policy analysis that more reliable and relevant information and remedies 
would emerge. J * 

Improve Citizen and Client Participation in Program 
Evaluation 

Jhis proposal, a variant of competitive evaluation, is based on the fact that most resources and 
responsibility for existing formal public program evaluation now lie within the control of executive 
agencies. Formal evaluation efforts of these agencies are, it is reasonably suggested, heavHy 
influenced by motives of self-maintenance and stability. They also tend frequently to be oriented 
toward existing service provider arrangements, affiliated organizations, and professional groups and 
associations, and toward dominant existing commercial and financial interests. Amidst the din from 
these politically active program stakeholders, the voices of the client and the citizen-taxpayer are 
often muffled, if not I6st. Though appropriate detailed mechanisms are not clear, the intent of this 
proposal is to increase the role of citizens in program evaluation. One specific recommendation is to 
make program evaluation results more accessible to the public. A stronger proposal is to make some 
share of evaluation funds and resources directly ayailableto citizen and client-oriented organizations 
and associations, (flaherty and Windle, May 1980) » 

Re-Examine Traditional Evaluation Premises 

This reform proposaltalls for a re-examination of the "fit" between (a) formal "rational/scientific" 
modes of information gathering and knowledge building and (b) the problem solving and program 
evaluation tasks that actually confront real-wprld operating agencies and programs. McLaughlin 
(1 980) bBlieves that "eltedajo 'fix' existing evaluation paradigms [the experimental and input-output 
modelsl/are unlikely to be fruitful." 'She concludes that (a) "many of the important factors in the local 
proces/of change may be inherently unquantifiable and not amenable to control," (b) "the logic of 
tnquiryjis'wrong," and (c) "fundamental incongruence between the set of relationships presumed by 
our ixdrent logic of inquiry and the local reality has led to spending much time and energy in 
developing new instruments to measure outcome and calibrate inputs. These efforts typically are 
undertaken at the expense of rethinking the conceptual framework for learning 'from 'project , 
experience." (pp. 45-46) ' 



Lindblom and Cohen (1979) have also tajfen a fundamental and critical took not just at social 
science-based evaluation, but at the larger class of what they call "professional social inquiry." 
Similarly in England, Sharpe (1976) has critically examined the relationship between the social 
scientist and policy making. * 



Conduct a Pre-Evaluation or Feasibility Assessment 

This reform has been developed most extensively^ Wholey (1979) and Schmidt, Scanlon and 
Bell (1979). Wholey, for example, cautions an agency not to rush to intensive evaluation until it has 
gone through some preliminary or pre-evaluation steps. He suggests a "sequential purchase of 
information." In order of apparent increa$ing cc/mmitment of resources, the steps are: evaluability 
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assessment, rapid-feedback evaluation, performance monitoring and intensive evaluation Wholey 
characterizes his proposed incremental sequence this way: Vw 

5 Rather than proceed directly from the program to be evaluated to intensive evaluation of program 

effectiveness, we insert one, two or three preliminary evaluation steps, any one of which may produce 
sufficient information for policy or management decHions Our approach produces relatively v 
4nexpens»ve information on program performance — within months, rather than years (pp f3-14) 

The next and final part provides guidance to the practitioner confronted with a decision about doing 
formal evaluation. 



\ 

\ 
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VI. Evaluating the Expected Value of 
Doing a Formal Evaluation 



Why Carry Out a Formal Evaluation Activity? 

There are several passible alternative purposes: compliance evaluation, formal social research, 
miscellaneous purposes and problem-oriented evaluation. 



6 



Compliance Evaluation 



/n th€ 



/n the last 15 years, governments have increased legal requirements in laws and regulations for 
formal organizational functions such as planning, needs assessment and Evaluation (Zangwill, 1977; 
Knezo, 1974) These mandated activities are often preconditions for new or continuing financial 
support. Many plans, needs assessments and evaluations, however, are created primarily for 
compliance purposes, and not primarily for the value they mayjiave Jo a sponsoring agency (for 
example, National Institute of Mental Health, 1977; Kimmel, 1977; and Lovell et al., 1979). 
Comphanoe evaluation" is likely to entail the minimum and sometimes symbolic effort required to 
achieve compliance. Government evaluation requirements may, however, allow a number of 
alternative evaluative and analytical activities. It may or may not be possible tcrgenerate benefit to the 
agency while still complying. 

Formal Social Research 

Some social scientists and professional evaluators apparently view the availability of funds for 
program evaluation as an opportunity for social research, somewhat independent of its payoff for 
policy and management purposes. This justification has been offered for work on social indicators 
and social.surveys and for methodology development.^ One result of publicly funded evaluation 
studies in the past may have been a test of the utility and relevance of formal research methods 
applied directly to operational program issues and policy problems. The result seems to be that the fit 
between these two sets of activities is poor, perhaps even counter-productive. (Lindblom and Cohen, 
1979) Some general social benefit, however, may have been derived (in the form of social learning) 
from exposing large numbers of researchers to the actual processes and complexities of social 
problem solving aad policy formulation, and from simultaneously giving policymakers and program 
officials a better appreciation of both the possibilities and the limitations of formal research methods 
applied in a public policy setting. ^ j 

Miscellaneous Purposes . * 

Beyond compliance and research lie a braad range of other possible reasons (motives) for 
considering some form of evaluation activity: N 

1 . To confirm what is already known or suspected about a program, either its weaknesses or 
strengths; . . 

2. To stimulate political response to a program by pressuring it, generating legitimacy for it or 
stimulating further advocacy support for it; 

3. To generate field feedback in the form of site-visit reports, case studies, program illus- 
trations or descriptive information for use in agency program docyments or justifications; 
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4'. To contribute tQ a general background of information or "enlightenment" (e.g., Weiss, Fall 
\ 1977); . 

5. To emulate cosmetically the practices o{ "scientific management;" 

6. To play out what Morrill and Francis (January 1 979) identify as'this "... syndrome; we have a 
problem, we don't know exactly what it is and don't have time to think it through, so let's get a. 

• Study to figure it out " (p. 28); and 

7. Other reasons and motives recognized by the reader 

The purist rational evaluator might object that 'some of these possible reasons (motives) for 
program evaluation are political and that evaluation should be "free of politics." The realist might 
respond that public agency evaluation that 1s free of all politics is likely to be free of all relevance. , 

Problem-Oriented Evaluation 

This type of evaluation derives from felt problems and issues, the exploration, clarification and 
amelioration of which may be enhanced-by some form of evaluative activity. 

Remedies for these problems, often problems of management, process, procedure and practice, 
do not lie in the establishment of "scientific facts" through comprehensive research, but in a more 
circumscribed, pragmatic efnd problem-oriented mode of identifying issues, articulating their struc- 
ture, and finding or inventing feasible and practical remedies to reduce or resolve them. For the 
exploration and remedy of these varied problems, no single, universal approach, method or tool 
exists beyond perhaps observation, thought, reflection, and common sense tutored by experience 
and trial and error. In State and local government settings, som^issges.and problems direct attention 
to the use of trouble-shooters, management analysis, operations analysis, descriptive studies, trend 
analyses (of costs, service utilization, staffing patterns, and so on), rapid feedback explorations and 
performance monitoring. Other probleffisTpoint to a broad array of interactive problem-solving 
mechanisms. Some may benefit from both interactive and formal study approaches. 

The next three sections present suggestions for initiating program evaluation activities. Summary 
guidance is first Outlined in table 4. ■ > { 

General Guidance for Program Evaluation 

Expectations * 

If your expectations about the payoff of formal program evaluation are very high, lower them. If yoir 
expect to derive "scientifically verified'lfacts and conclusions, you will be disappointed. You are more 
likely to be satisfied if you expect small and not large additions to your understanding of a given 
program, its problems and possible remedies; partial reality testing and not global confirmation (or 
refutation) of your beliefs and opinions; dnd a supplement to (sometimes small) rather than a 
substitute for the information, knowledge and feedback which already exists. 

Bias 

While ypu may be able to control willful bias and blatant valug loading, all program evaluation and 
performance, monitoring activity is selective, value influenced and value embedded. If there were no 
values Gpaiding an evaluation, of what possible interest and use would it b6? 
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Table 4 

Guidance for Program Evaluation 



A. Motives for Evalaation: 

r . 

' > Acknowledge the many possible alternative motives for evaluation. 
Decide which ones suit the immediate situation. 

B. General Guidance: 

1 . Expectations: Be realistic. Keep them moderate. „ 

2. Bias: Control what you can and be alert to what you cannot. 

3. Sca/e; - Break potentially large studies into several small ones. 

4. Abstractness: Aim studies at concrete well-defined issues. 

5. Beneficial interactions: Maintain moderate-levels of regular x 
interaction among sponsors/users and evaluators. 

6. Risk: m Reduce risk of study failure— by spreading it. s 

7 Politics:* Expect them and makeMhe most sensible use of 
them^ 

C. Pre-Evaluation Preparations (Homework): 

1. Identify specific, concrete program problems and issues; 



2. 
3. 
4. 



consider a "program evaluation issue paper." 

Identify and interact regularly with expected users. 

"Scout around" to get some feel for evaluation possiblities. 

Consider several alternative types of possible evaluation: 

"Quick and Dirty," 
Rapid Feedback, 
Exploratory, or 



Problem-Oriented. 



Useful Practices and Rules of Thumb: 

1 . Fit tools to problems (and not vice-\fersa). 

2. Know your evaluator(s). 

3. Consider e^altfafioTTS^i interactive and negotiated propess. 

4. Do not isolate the evaluator(s). 

5. Demand/prepare intelligible reports. 

6. Ask evaluafors Jo include qualitative reporting' and judgments. 
• 7. Keep evaluators involved in technical assistance. 



Scale 

A mix of several small program evaluation studies and activities directed at the same program are 
probably better than one large one. Studies of narrow scope are more likely to be focused, feasible 
and manageable. Tl^ey are alsoYnore likely to pay off in terms of relevance, currency and cost (in both 
time and money). 0 
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Abstractness 

Evaluation aimed at concrete, well-defined issues and areas is likely to be more useful, though 
maybg less dramatic, than open-ended studies guided only by an abstract interest in how. well a 
program is serving "the public interest;" meeting "comprehensive community needs," or achieving 
broad and diffuse goals and objectives. 

beneficial Interactions • 

A moderate degree of sustained interaction between evaluators*and their sponsors (or intended 
users) is likely to result in better mutual understanding of the logic, possibilities and limitations of an 
evaluation, pepnit better tailoring of stufly scope, focus and method to felt problems, concerns and 
intended us^s of evaluation resuj)* by agency personnel, induce a larger exchange of qualitative 
- information; and reduce the surprise and Potential threat of study findings. 

Risk ' r * 

Breaking a potentially large and comprehensive evaluation into smaller components js one way to 
reduce the risk of failure by spreading it. It is also a way to more easily fit tools, approaches and skills 
to varying dimensions of an evaluation problem. Similarly, it avoids putting all evaluation eggs in one 
methodological basket. This strategy was consciCusly employed at the Federal level by the National 
Institute of Education (NIE) when it answered a mandate from Congress for an evaluation of 
compensatory education programs Study Director Hill reports that deadlines helped them spread 
risk: . ' 

The deadlines also forced us to define simple projects thaf could be designed, put into the field, and 
reported qOickly. We mounted a large number of small projects, each designed to Vxomplish a simple 
objective, rather than a few cWnplex multi-purpose studies That practice had sevWai advantages It 
mea nt that each project was simple enough for one NIE staff member, rather than a teakUo monitor 

Similarly, because our contractors did not need vast interdisciplinary teams of researchers, they . 
experienced fewer managerial problems Because projects were relatively self-contained, a problem 
- or failure in one did not threaten the whole study. We were, finally, able to conduct backup studies to 
protect ourselves against the possible failure* of very crucial or difficult efforts (Pincus, ed., 1980, p 

' 6?) 

Though this evaluation effort was large and lasted several years, the basic Jogic of 
risk-spreading also applies to small scale efforts. 

Politics ' - 

Expect politic^ and make the most sensible use of them. • , 

Additional advice can be found in many other sources (Patton, 1 978; Flaherty and Windle* 
1980; Levine and Williams, 197.1; Morrill and Francis, 1979; Baumheier.ret al„ 1977). 



Pre-Evaluation Preparations (Homework) 

v - ' ■ . , * 

If an evaluation is underconsideration, a simple sequence of thoughts and actions may enlighten . 
the decision to proceed by exploring the purposes and uses an evaluation might serve. The, 
sequence begins with two interrelated questions: 
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1 . What is (are) the specific.issue(s) orproblem(s) that an evaluation is intended to addr&S& % 
This is probably one of the twamost important questions which can be raised about a proposed , 
evaluation. It is alap ojie which may be ignored or skirted in the belief that evaluation is good in its own 
right. A detailed specification of the problem(s) or issue(s) that is (are) to be addressed is a " 
prerequisite to judging whether an evaluation is appropriate. It represents essential homework. 
(Morrill and Francis, 1979) 

One approach to elaborating program or policy problem(s) is to specify a'set of questions to be 1 ^ 
answered. This can be dotie, obviously, "in the head" by & "thought experiment" — what would it be r 
like if. . . ? (Wildavsky, 1979) or by some "back-oMhe-envelope" jottings and calculations. A more 
structured approach is a program evaluation issXje paper — a short written statement that lays out in 
tentative terms: 

• The perceived ndture and apparent structure of the program problem(s); 

• Likely sources of the probjem(s); ' • » 

• Known and suspected evidence of the existence of the problem(s); 

• Alternative actions that might be taken by the agency to remefify the problem(s);* 

• Indicators which might be employed to showprogress towacd resolving the.problem(s), 

p Estimated costs (of many kinds) and impacts of possible remedies for the perceived problem(s), 
• • Significant known or likely constraints on r.educing the problem (s); 

• Major evaluative, analytic or data problems that have to be faced and handled if further study 
and investigation is to proceed; 

• A list of key steps in additional study or investigation that might be taken and an estimate of their 
cost, skill requirements and timing; and 

• °An identification of expected users of study results. , * 

A program evaluation issue paperns a way to identify and describe the main features of a program 
problem(s) based on what is known or can be easily learned from existing sources. It is preliminary to 
more extensive evaluation, analysis or data collection. A welhete^eloped issue paper should indicate 
whether a given program issue or problem can be clarifiedf by further evaluation, analysis, better 
estimates ot costs or impacts, a more refined understanding of the sources of a problem(s), or by 
sofne other action or response. / . ; * „ * 

Results of this pre-ev§luation should help indicate whether additional steps ought to be a 
management analysis, cost-effectiveness study, use of a trouble-shooter, an exploratory evaluation 
or some. other action^mechanism or form of structured analytical work. Details of the contents and 
formats of two alternative issue papers, useful in both program analysis and program evaluation, are 
provided in Hatry et al. (1976). An intriguing dase studyj>f the "swine flu affair" and what a useful 

, program/ policy issue paper might look like the next time around have been prepared by Neustadt and 

"Fineberg (1978). < 

A principal purpose served by clarifying and ejaboratmg major program and policy issues and 
problems before starting more intensive evaluative work and data collection is to ensure thattools 
and approaches are selected to -fit problems rather than vice-versa.' 



2. A Who are the exp^tedJikely useri of evaluation results? The answers to the first question and 
to' this one ate inter<ifcn(A(. As a growing number of experienced participant-observers have 
confirmed, relevant and" u^efjevaluations do not grow out of idle curiosity, abstract concerns with 
science or the public interest or an academic interest in splitting intellectual hairs. They grow instead 
out of the live (sometimes nagging) questions, issues, problems and "felt" needs for information of 
involved and participating program leaders, managers, staff and other key r program influen'ials: 

As Patton (1 978) has urged, expected users of evaluation information should be identified early 
This should not be a guessing game. Interact with prospective users. Discuss a possible study with 
theta. Elicit their questions and concerns framed in ttjeir terms. Indicate realistically what types of 
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information are likely to be generated, the probable quality of that information (including its 
limitations), timing, and so on. The skeleton of a program evaluation issue paper may be useful here. 
Do some additional legwork and then return for additi6nal discussion with the%e potential users. 

3. With a set of preliminary qqestions and likely users welt in mind, h ave someone staff it out and 
scout around to get some, preliminary feel for (a) the program's operations through interviews, site 
Visits and examination of program reports, (b) the feasibility of carrying out the kind of inquiryyou had 

in mind, (c) the level of effort, costuming and skills it might require; £nd (d) what might reasonably be 
expected to result. Scouting around may also contribute to t'he development of an issue paper. 

4. If at this stage you decide to proceed with some variety of formal evaluation activity, consider 
the- utility and value of a 'rapid feedback evaluation, an exploratory evaluation or "evaluability 
assessment/' (Wholey, j 979; Schmidt et al.,-1979) These further pre-evaluation steps may be more ; 
extensive than those discussed so far but they cover some of the same preliminary steps required by 
a full-blown formal evaluation. Details of one possible approach to an "evaluability assessment" cap 
be found in the Schmidt et al. monograph published by Project SHARE, Evaluability Assessment: 
Making Public Programs Work Better, 1979. 

As noted earlier, the formalization of evaluability assessment is relatively new. It was designed for 
use at the Federal level and is still in a developmental stage. It appears to grow partly out of the failure 
of traditional research modes of evaluation to pay off and partly out of the growing recognition ov6r the 
past 5 to 1 0 years that under .many circumstances a full formal Evaluation will be neither feasible nor 
desirable. The principals whd developed this approach report that there is no packaged experience 
available on its strengths, weaknesses orthe conditions under which it pays off. They urge, as we do, 
restraint and caution in its user 

Useful Practices and Rules of Thumb 

Fit Tools to Problems 

Attempt to fit evaluation tools, methods and approaches to the nature and structure of perceiyed 
program problems rather than vice-versa. It is commonplace to find tools in search of problems and 
methods in search of applications. * * 

Know Your Evaluator 

Get to know your evaluator(s), their training, past work, style of thought, and preferred tools and 
approaches, ^valuators and other professionals are predisposed to do what they know best; i.e., 
their specialty. In a caricatured health-care analogy, surgeons cut, dentists drill, psychiatrists pro,be 
the rpmd and nutritionists explore eating habits. Where wilhhe attention of your evaluator be drawn? 
To methods, to models, to tests and measures, to questionnaires, to interviews, to "gestalt" patterns, 
to qualitative considerations (like the history and .context of the program), to philosophy? To 
input-output relationships, to the "black box" in between/to individual clienf outcomes, to program 
processes, to broad community impacts, to administrative and management mechanisms? To the 
political and bureaucratic environment? A reasonable exploration of the evaluator's predispositions, 
style of thinking and areas of professional comfort will permit more fruitful interaction between the 
cl^nt(s) and evaluator(s) and a more productive negotiated process of evaluation. v . 

Consider Evaluation an Interactive Process ( > 

Do not^Jepnve the evaluator of your concerns, the problems you perceive, areas of greater and 
lesser importance, blind spots, taboo issues, or unusual constraints on the agency and its range of 
possible corrective program actions. Sponsors of evaluation who do not articulate their concerns are 
not likely to get in return what they consider useful. ^ 
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Similarly, encourage and ensure that evaluators interact with knowledgeable program officials and 
operatives not only at the start of a study but at regular intervals along the way. This will serve two 
purposes. First, it will permit the evaluator to access the qualitative and experiential knowledge, 
insights and understanding essential to reality-based evaluation. Much of this knowledge comes 
only from being involved in the historical-development and daily business of program operations. No 
newevaluator(s) can approximate the collective wisdom and insights about a program of those who 
have been "dwelling" in it. Second, these interactions should (a) provide the evaluator with an 
enhanced understanding of the human roles and perspectives at Work in the program; (b) increase 
the evaluator s knowledge of the details of the program's actual operations; (c) reduce the threat to 
antianxiety of the evaluator which may be associated with limited understanding of the program, and 
(d) forestall trips down technical alleys in search of answers that may be at the fingertips of the 
experienced program official. 

Do Not Isolate the Evaluator 

Do* not isolate the evaluator with the misguided intention, of protecting his or her objectivity or 
impartiality. Myths to the contrary, evaluation properly employed serves concrete purposes and 
interests and not abstract notions of science or the public interest. Encourage a flexible overall study 
approach that permits both the agency and the evaluator to suggest midcourse corrections^ Within 
the boundaries of reasonableness and of prior commitments made to the evaluator, do not be 
reluctant to interfere in the course a study may take. Some agencies attempt to increase the 
articulation between their evolving interests and outside evaluators by considering the agency's 
evaluation monitor an integral part of the evaluation study team, 

Demand/Prepare Intelligible Reports 

It is a truism, regularly ignored, that findings of an evaluation that are not presented in an accessible 
and intelligible form will not be easily used. (Larson, 1979) In another place, the author and 
colleagues (Hatry et al., 1976) give some basic advice on the presentation of the results of program 
analysis. It applies with equal force to evaluation reports: 

Some of the most sophisticated and technically competent program analyses [Read evaluations"! 
are unused and' unusable. The reasons are varied the main findings of the analysis may have 
vanished in a thicket of technical jargon, the recommended alternatives may be politically infeasible, 
the report on the analysis may have come too late, or the bureaucracy that must use the findings may 
be uninterested or resistant In brief, program analysis [evaluation! can be elegant but irrelevant (p 9) 

■ The authors further advise, have the report reviewed for technical quality and clarity, include 
minQnty reports, include a clear.corhpact summary, acknowledge the limitations and assumptions of 
the study, use simple graphics to display major findings and conclusions, eliminate jargon; and tailor 
the presentation of results to the communication Style of expected key users, (pp. 24-25) 

Ask Evaluators for Qualitative Reporting and Judgments 

' Some of the most insightful and helpful reporting in evaluation studies may have little or nothing to 
.do with the Jesuits of applying formal &tudy methods. As implied earlier, evaluators usually copie to 
the end of their formal methods before they come to the, end of their wits. The observations and 
insights generated casually during the course of a study should be openly reported. One way to 
ensure this is to encourage evaluators to devote special sections of the report to qualitative reporting 
and personal interpretations. 



Keep the.Evaluator(s) Involved in Technical Assistance 

For a variety of reasons, including the way some are employed (through time-limited contracts), 
evaluators may be hit and run." But the presentation of study reports is but a beginning or midpoint of 
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program improvement, it remedial actipn is atjreed upon by the agency, ensure that evaluators are, 
when possible, available to assist in effecting corrective action. This will not only sustain a blend of 
evaluator and program skills and knowledge, butalsodiscourage the e valuator from formulating 
impractical recommendations that he/she may later have to help implement. 



w 0 
0 ' 
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